Health Checks

Use this module when you need nginx to actively probe backends, report readiness, and surface health metrics without relying on a separate monitoring system.

When to use this module

You need a consistent readiness answer across all nginx workers, not a per-worker guess.
You want active HTTP or HTTPS probes with configurable thresholds to detect backend failures.
You want Prometheus-format metrics for health state in your monitoring stack.
You need your upstream balancer to exclude unhealthy or recovering peers from traffic.
You want per-upstream and per-peer probe visibility alongside service-level health.

nginx.conf synthesis

Put the health and readiness endpoints on internal locations. Configure service-level probes and, when needed, per-upstream and per-peer probes.

http {
    # Service-level probe targeting the app health endpoint
    location /healthz {
        health_liveness;
    }

    location /ready {
        health_readiness;
        health_probe http://127.0.0.1:8080/health;
        health_probe_interval 3000ms;
        health_probe_timeout 1000ms;
        health_probe_fails 3;
        health_probe_passes 2;
    }

    location /health {
        health_status;
    }

    location /metrics {
        health_metrics;
    }
}

For per-upstream probes that feed peer eligibility into the balancer:

upstream backend {
    server 10.0.0.11:8080;
    server 10.0.0.12:8080;

    health_upstream_probe http://10.0.0.11:8080/health;
    health_upstream_probe_interval 5000ms;
    health_upstream_probe_fails 2;
    health_upstream_probe_passes 1;
    health_upstream_probe_slow_start 30s;
}

This configuration probes each upstream, tracks health at the peer level, and lets the upstream balancer exclude unhealthy or slow-starting peers from selection.

Directive reference

`health_status`

Contexts: location
Default: disabled

Enables the /health JSON endpoint. It reports service-level, per-upstream, and per-peer probe state in a single response.

`health_liveness`

Contexts: location
Default: disabled

Enables the /healthz liveness endpoint. Always returns 200 when nginx is alive, regardless of probe state.

`health_readiness`

Contexts: location
Default: disabled

Enables the /ready readiness endpoint. Returns 200 when the service-level probe passes and 503 when it fails.

`health_metrics`

Contexts: location
Default: disabled

Enables a Prometheus-format metrics endpoint that exports probe state, health transitions, and counters.

`health_probe`

Contexts: location
Default: none

Sets the service-level probe target URL. Format is http[s]://host:port/path.

`health_probe_interval`

Contexts: location
Default: 5000ms

How often the active probe fires. Lower values detect failure faster but increase probe traffic.

`health_probe_timeout`

Contexts: location
Default: 1000ms

Socket-level timeout for the probe connect, send, and receive phases.

`health_probe_fails`

Contexts: location
Default: 2

Consecutive failures needed before the probe target is marked unhealthy. Prevents flapping from transient errors.

`health_probe_passes`

Contexts: location
Default: 1

Consecutive successes needed before an unhealthy target is marked healthy again.

`health_probe_slow_start`

Contexts: location
Default: 0 (disabled)

Duration after recovery during which the peer is kept out of balancer rotation. Use this to let a recovering backend warm up before receiving traffic.

`health_probe_match`

Contexts: location
Default: none

Match rules for the probe response. Format: status=<min>-<max> [body=<str>]. Only responses matching the rule count as successful.

`health_worker_events_channel`

Contexts: location
Default: none

Publishes service-level probe state transitions to the named channel in the worker-events default zone. Set this when other modules or tooling need to observe health flips.

`health_upstream_probe`

Contexts: upstream
Default: none

Sets a per-upstream probe target. Overrides the service-level probe for this upstream group.

`health_upstream_probe_interval`

Contexts: upstream
Default: 5000ms

Per-upstream probe interval.

`health_upstream_probe_timeout`

Contexts: upstream
Default: 1000ms

Per-upstream probe timeout.

`health_upstream_probe_fails`

Contexts: upstream
Default: 2

Per-upstream fail threshold.

`health_upstream_probe_passes`

Contexts: upstream
Default: 1

Per-upstream pass threshold.

`health_upstream_probe_slow_start`

Contexts: upstream
Default: 0

Per-upstream slow-start recovery ramp.

`health_upstream_probe_match`

Contexts: upstream
Default: none

Per-upstream match rules for probe response validation.

`health_upstream_peer_probe`

Contexts: upstream
Default: none

Sets a per-peer probe target. Format: <addr> <http[s]://host:port/path>. This directive is repeatable for multiple peers.

Variables

The module exports these nginx variables for use in logging or scripting:

Variable	Description
`$health_readiness`	Returns `1` if the service-level probe is passing, `0` otherwise
`$health_liveness`	Returns `1` when nginx is alive
`$health_backend_healthy_count`	Number of backends currently passing their probes
`$health_backend_total_count`	Total number of tracked backends
`$health_backend_failure_count`	Number of backends currently failing their probes

Behavior notes

Passive request and failure counters exclude the health endpoints themselves.
Probe results are shared across workers, but only worker 0 performs the periodic probe loops.
Liveness and readiness are intentionally different. Nginx can be alive while readiness is failing.
Upstream peer selection consumes probe state through the upstream balancer, which excludes unhealthy peers and peers still inside slow-start recovery.
If a peer has no probe configured, it is treated as healthy (fail-open semantics).

Works well with

Stock nginx proxy_pass and upstream server directives — health checks feed peer eligibility into standard nginx upstream selection.
Upstream Balancer because it excludes unhealthy and slow-starting peers at request time.
Worker Events for publishing health transition notifications across workers.
Prometheus Metrics for scraping health state into monitoring.