Health Checks

Use this module when you need nginx to actively probe backends, report readiness, and surface health metrics without relying on a separate monitoring system.

When to use this module

  • You need a consistent readiness answer across all nginx workers, not a per-worker guess.
  • You want active HTTP or HTTPS probes with configurable thresholds to detect backend failures.
  • You want Prometheus-format metrics for health state in your monitoring stack.
  • You need your upstream balancer to exclude unhealthy or recovering peers from traffic.
  • You want per-upstream and per-peer probe visibility alongside service-level health.

nginx.conf synthesis

Put the health and readiness endpoints on internal locations. Configure service-level probes and, when needed, per-upstream and per-peer probes.

http {
    # Service-level probe targeting the app health endpoint
    location /healthz {
        health_liveness;
    }

    location /ready {
        health_readiness;
        health_probe http://127.0.0.1:8080/health;
        health_probe_interval 3000ms;
        health_probe_timeout 1000ms;
        health_probe_fails 3;
        health_probe_passes 2;
    }

    location /health {
        health_status;
    }

    location /metrics {
        health_metrics;
    }
}

For per-upstream probes that feed peer eligibility into the balancer:

upstream backend {
    server 10.0.0.11:8080;
    server 10.0.0.12:8080;

    health_upstream_probe http://10.0.0.11:8080/health;
    health_upstream_probe_interval 5000ms;
    health_upstream_probe_fails 2;
    health_upstream_probe_passes 1;
    health_upstream_probe_slow_start 30s;
}

This configuration probes each upstream, tracks health at the peer level, and lets the upstream balancer exclude unhealthy or slow-starting peers from selection.

Directive reference

health_status

  • Contexts: location
  • Default: disabled

Enables the /health JSON endpoint. It reports service-level, per-upstream, and per-peer probe state in a single response.

health_liveness

  • Contexts: location
  • Default: disabled

Enables the /healthz liveness endpoint. Always returns 200 when nginx is alive, regardless of probe state.

health_readiness

  • Contexts: location
  • Default: disabled

Enables the /ready readiness endpoint. Returns 200 when the service-level probe passes and 503 when it fails.

health_metrics

  • Contexts: location
  • Default: disabled

Enables a Prometheus-format metrics endpoint that exports probe state, health transitions, and counters.

health_probe

  • Contexts: location
  • Default: none

Sets the service-level probe target URL. Format is http[s]://host:port/path.

health_probe_interval

  • Contexts: location
  • Default: 5000ms

How often the active probe fires. Lower values detect failure faster but increase probe traffic.

health_probe_timeout

  • Contexts: location
  • Default: 1000ms

Socket-level timeout for the probe connect, send, and receive phases.

health_probe_fails

  • Contexts: location
  • Default: 2

Consecutive failures needed before the probe target is marked unhealthy. Prevents flapping from transient errors.

health_probe_passes

  • Contexts: location
  • Default: 1

Consecutive successes needed before an unhealthy target is marked healthy again.

health_probe_slow_start

  • Contexts: location
  • Default: 0 (disabled)

Duration after recovery during which the peer is kept out of balancer rotation. Use this to let a recovering backend warm up before receiving traffic.

health_probe_match

  • Contexts: location
  • Default: none

Match rules for the probe response. Format: status=<min>-<max> [body=<str>]. Only responses matching the rule count as successful.

health_worker_events_channel

  • Contexts: location
  • Default: none

Publishes service-level probe state transitions to the named channel in the worker-events default zone. Set this when other modules or tooling need to observe health flips.

health_upstream_probe

  • Contexts: upstream
  • Default: none

Sets a per-upstream probe target. Overrides the service-level probe for this upstream group.

health_upstream_probe_interval

  • Contexts: upstream
  • Default: 5000ms

Per-upstream probe interval.

health_upstream_probe_timeout

  • Contexts: upstream
  • Default: 1000ms

Per-upstream probe timeout.

health_upstream_probe_fails

  • Contexts: upstream
  • Default: 2

Per-upstream fail threshold.

health_upstream_probe_passes

  • Contexts: upstream
  • Default: 1

Per-upstream pass threshold.

health_upstream_probe_slow_start

  • Contexts: upstream
  • Default: 0

Per-upstream slow-start recovery ramp.

health_upstream_probe_match

  • Contexts: upstream
  • Default: none

Per-upstream match rules for probe response validation.

health_upstream_peer_probe

  • Contexts: upstream
  • Default: none

Sets a per-peer probe target. Format: <addr> <http[s]://host:port/path>. This directive is repeatable for multiple peers.

Variables

The module exports these nginx variables for use in logging or scripting:

VariableDescription
$health_readinessReturns 1 if the service-level probe is passing, 0 otherwise
$health_livenessReturns 1 when nginx is alive
$health_backend_healthy_countNumber of backends currently passing their probes
$health_backend_total_countTotal number of tracked backends
$health_backend_failure_countNumber of backends currently failing their probes

Behavior notes

  • Passive request and failure counters exclude the health endpoints themselves.
  • Probe results are shared across workers, but only worker 0 performs the periodic probe loops.
  • Liveness and readiness are intentionally different. Nginx can be alive while readiness is failing.
  • Upstream peer selection consumes probe state through the upstream balancer, which excludes unhealthy peers and peers still inside slow-start recovery.
  • If a peer has no probe configured, it is treated as healthy (fail-open semantics).

Works well with

  • Stock nginx proxy_pass and upstream server directives — health checks feed peer eligibility into standard nginx upstream selection.
  • Upstream Balancer because it excludes unhealthy and slow-starting peers at request time.
  • Worker Events for publishing health transition notifications across workers.
  • Prometheus Metrics for scraping health state into monitoring.