Troubleshooting
A symptom-driven guide to the failures Prometheus Proxy most commonly produces, with how to confirm each cause and how to fix it.
First steps
Before diving into a specific symptom, gather signal:
- Enable admin and metrics endpoints on both proxy and agent (
--admin --metrics, orADMIN_ENABLED=true METRICS_ENABLED=true). See Monitoring. - Check liveness/health:
curl http://<host>:<admin-port>/ping(expectspong) andcurl http://<host>:<admin-port>/healthcheck(health JSON). Admin ports default to8092(proxy) and8093(agent). - Scrape the path directly, bypassing Prometheus:
curl -i http://<proxy-host>:8080/<path>— the HTTP status code tells you a lot (see below). - Watch the outcome metric:
proxy_scrape_requestsis labeled bytype(success,timed_out,no_agents,path_not_found,payload_too_large, …). Whichevertypeis incrementing names the failure mode. - Read the logs. The proxy and agent log the reason for most rejections at WARN/INFO.
Agent can't connect to the proxy
Address types of NameResolver 'unix' ... not supported by transport
The gRPC client fell back to the unix name-resolver scheme on a hostname target because the
DNS resolver provider was missing from the shaded JAR.
- Cause: a fat JAR built before this was fixed dropped grpc's
DnsNameResolverProviderfromMETA-INF/services. - Fix: use the 3.2.0 or newer
prometheus-proxy.jar/prometheus-agent.jar(the providers are re-registered viasrc/shadow/resources). As a stopgap, addressing the proxy by IP instead of hostname also avoids DNS resolution.
Connection refused / timeouts on the gRPC port
- Confirm the agent's
PROXY_HOSTNAME(and port, if not the default50051) points at the proxy's reachable address. In Kubernetes this is usually aLoadBalancerhost — see Kubernetes. - Confirm a firewall/security group allows the agent egress to the proxy gRPC port.
- gRPC is HTTP/2 — if an L7 proxy or ingress sits in front, it must speak HTTP/2/gRPC.
UNAUTHENTICATED: Missing or invalid agent token
The proxy has an agent token configured and the agent's token is missing or doesn't match.
- Fix: set the same
--agent_token/AGENT_TOKENon the agent as on the proxy. An empty token on the proxy disables the check. See Security.
Prometheus scrape returns an error
Scrape the path directly with curl -i and match the status:
404 Not Found
The proxy has no usable mapping for that path.
metrics_pathmismatch — the Prometheusmetrics_pathmust exactly equal the agent'spath(with a single leading slash).app1_metricsin the agent →metrics_path: '/app1_metrics'in Prometheus.- Multi-segment path — a registered
pathcontaining an embedded/is rejected at registration withMulti-segment path not supported (use a single path segment), and scrapes 404. Use a single segment (e.g.app_metrics, notapp/metrics). - Agent not registered — confirm the agent started cleanly and registered the path
(check
proxy_path_map_sizeand the agent logs).
503 Service Unavailable
The path is known but cannot be served right now.
- Proxy shutting down (
proxy_not_running). - No agent for the path (
no_agents) — the owning agent disconnected. Checkproxy_agent_map_sizeand whether the agent is up. - Agent disconnected mid-scrape (
agent_disconnected). Transient during agent restarts; for redundancy use consolidated mode.
413 Payload Too Large
The scrape body exceeded a configured size limit.
- Agent side: raise
agent.http.maxContentLengthMBytes(default10). - Proxy side: raise
proxy.internal.maxUnzippedContentSizeMBytes(the decompressed-size guard;0means "reject all"). - The proxy outcome label for this is
payload_too_large.
Scrape hangs, then timed_out
The agent didn't return a response within the timeout.
- The target endpoint is slow — raise the agent's
scrapeTimeoutSecs(default15) and/orclientTimeoutSecs(default90). - The agent is saturated — raise
maxConcurrentClients(default1) so scrapes run in parallel; watchagent_scrape_backlog_sizeandproxy_cumulative_agent_backlog_size. - See Performance Tuning.
Registration is rejected
The proxy logs the reason and returns valid = false to the agent:
| Log reason | Cause / fix |
|---|---|
Multi-segment path not supported (use a single path segment) |
The path contains an embedded /. Use one segment. |
Consolidated agent rejected for non-consolidated path |
A consolidated agent tried to join a path another agent owns non-consolidated. Make all agents for the path consolidated, or none. |
Non-consolidated agent rejected for consolidated path |
The reverse of the above — same fix. |
See Consolidated Mode for the all-or-nothing rule.
TLS handshake failures
- Untrusted certificate — the side validating the cert needs the signing CA in its
trustCertCollectionFilePath. - Hostname / SAN mismatch — the cert's SAN must match the hostname the agent dials, or set
tls.overrideAuthority(the test fixtures usefoo.test.google.fr). - Mutual auth — with mutual TLS the agent must also present
certChainFilePath/privateKeyFilePath, and the proxy must trust the client CA. A server-only config will be rejected. - For HTTPS scrape targets signed by a private CA, point the agent at
--https_truststorerather than disabling validation with--trust_all_x509.
See TLS Setup and the example configs.
Metrics or admin endpoints return 404
The endpoints are disabled by default. Enable them with --admin / --metrics (or
ADMIN_ENABLED=true / METRICS_ENABLED=true) and scrape the metrics port (8082 proxy,
8083 agent), not the admin port.
Disconnects aren't detected behind Nginx
With transportFilterDisabled (required when fronting gRPC with Nginx), agent disconnects
aren't noticed immediately — stale contexts are cleaned up only after
proxy.internal.maxAgentInactivitySecs (default 60s). This is expected; see
Nginx Reverse Proxy.
Still stuck?
- Cross-check metric meanings in Monitoring.
- Confirm option names and resolution order in the CLI Reference (precedence is CLI → env → config → defaults).
- File an issue at
github.com/pambrose/prometheus-proxy/issues
with the relevant proxy/agent logs and the direct
curl -ioutput.