Running in Production
Guidance for deploying Prometheus Proxy in a production environment, pulling together the security, reliability, and tuning knobs documented elsewhere into one operational checklist.
Security
- Encrypt the gRPC channel with TLS, and prefer mutual TLS so the proxy authenticates agents and vice-versa. See TLS Setup.
- Set an agent token (
--agent_token/AGENT_TOKEN) as a lightweight app-level control in addition to TLS. An empty token leaves the agent port open and logs a startup warning. - Segment the network so only trusted agents can reach the gRPC port (
50051). - Do not expose the admin port publicly.
/threaddumpand friends are operational tools, not public endpoints — keep8092/8093on an internal network only. - When forwarding auth headers to targets, require TLS so credentials aren't sent in plaintext between proxy and agent. See Auth Header Forwarding.
High availability
- The proxy is a single instance per path namespace — agents connect to one proxy. Run a second, independent proxy with its own agents for a separate failure domain rather than load-balancing one path across two proxies.
- For agent redundancy, use consolidated mode: multiple agents register the same path, and the proxy keeps serving it as long as one remains connected. This also smooths rolling upgrades — the new agent registers before the old one drains.
- Set Kubernetes liveness/readiness probes to
/pingand/healthcheckso unhealthy pods are restarted and kept out of rotation. See Kubernetes.
Sizing & tuning
Start from the defaults and adjust against the backlog and latency metrics:
| Parameter | Default | When to change |
|---|---|---|
agent.maxConcurrentClients |
1 | Raise for many endpoints or slow targets |
agent.scrapeTimeoutSecs |
15 | Raise for slow targets |
agent.http.clientTimeoutSecs |
90 | Lower for fast-failing scrapes |
agent.chunkContentSizeKbs |
32 | Raise for large payloads to cut chunk count |
agent.minGzipSizeBytes |
512 | Lower to compress more aggressively |
agent.http.maxContentLengthMBytes |
10 | Raise for large scrape bodies (guards agent heap) |
proxy.internal.maxUnzippedContentSizeMBytes |
— | Raise for large decompressed payloads |
proxy.internal.maxAgentInactivitySecs |
60 | Tune stale-agent eviction window |
Size the JVM heap for the largest decompressed payload times the scrape concurrency, and
watch agent_scrape_backlog_size / proxy_cumulative_agent_backlog_size — a steadily growing
backlog means agents can't keep up. See Performance Tuning.
Observability
- Enable metrics on proxy and agent and scrape their internal
/metrics. See Monitoring. - Import the dashboards and alert rules from Grafana & Alerting.
- Alert on success rate, P99 latency, agent count, and backlog growth — the rules on that page cover each.
Logging
- Leave
requestLoggingEnabledon if you want per-scrape logs — they emit at DEBUG, so they won't flood INFO on a busy proxy. - Use
logLevel = "trace"for the most verbose output; the legacy"all"level was removed and now fails fast at startup.
Shutdown
- Standalone proxy/agent processes shut down cleanly on SIGTERM. Embedded agents should be
stopped via
EmbeddedAgentInfo.shutdown()(orclose()), which blocks until terminated. See Embedded Agent.
Pre-flight checklist
- [ ] TLS (ideally mutual) enabled on the gRPC channel
- [ ] Agent token set, or mutual TLS in place
- [ ] gRPC port reachable by agents; admin port not publicly exposed
- [ ] Metrics enabled and scraped by Prometheus
- [ ] Dashboards imported and alert rules loaded
- [ ]
maxConcurrentClients/ timeouts tuned for your targets - [ ] Content-size limits sized for your largest payload
- [ ] JVM heap sized for peak decompressed payload × concurrency
- [ ] Liveness/readiness probes wired to
/pingand/healthcheck - [ ] Consolidated mode configured where you need agent redundancy