Running in Production

Guidance for deploying Prometheus Proxy in a production environment, pulling together the security, reliability, and tuning knobs documented elsewhere into one operational checklist.

Security

Encrypt the gRPC channel with TLS, and prefer mutual TLS so the proxy authenticates agents and vice-versa. See TLS Setup.
Set an agent token (--agent_token / AGENT_TOKEN) as a lightweight app-level control in addition to TLS. An empty token leaves the agent port open and logs a startup warning.
Segment the network so only trusted agents can reach the gRPC port (50051).
Do not expose the admin port publicly. /threaddump and friends are operational tools, not public endpoints — keep 8092/8093 on an internal network only.
When forwarding auth headers to targets, require TLS so credentials aren't sent in plaintext between proxy and agent. See Auth Header Forwarding.

High availability

The proxy is a single instance per path namespace — agents connect to one proxy. Run a second, independent proxy with its own agents for a separate failure domain rather than load-balancing one path across two proxies.
For agent redundancy, use consolidated mode: multiple agents register the same path, and the proxy keeps serving it as long as one remains connected. This also smooths rolling upgrades — the new agent registers before the old one drains.
Set Kubernetes liveness/readiness probes to /ping and /healthcheck so unhealthy pods are restarted and kept out of rotation. See Kubernetes.

Sizing & tuning

Start from the defaults and adjust against the backlog and latency metrics:

Parameter	Default	When to change
`agent.maxConcurrentClients`	1	Raise for many endpoints or slow targets
`agent.scrapeTimeoutSecs`	15	Raise for slow targets
`agent.http.clientTimeoutSecs`	90	Lower for fast-failing scrapes
`agent.chunkContentSizeKbs`	32	Raise for large payloads to cut chunk count
`agent.minGzipSizeBytes`	512	Lower to compress more aggressively
`agent.http.maxContentLengthMBytes`	10	Raise for large scrape bodies (guards agent heap)
`proxy.internal.maxUnzippedContentSizeMBytes`	—	Raise for large decompressed payloads
`proxy.internal.maxAgentInactivitySecs`	60	Tune stale-agent eviction window

Size the JVM heap for the largest decompressed payload times the scrape concurrency, and watch agent_scrape_backlog_size / proxy_cumulative_agent_backlog_size — a steadily growing backlog means agents can't keep up. See Performance Tuning.

Observability

Enable metrics on proxy and agent and scrape their internal /metrics. See Monitoring.
Import the dashboards and alert rules from Grafana & Alerting.
Alert on success rate, P99 latency, agent count, and backlog growth — the rules on that page cover each.

Logging

Leave requestLoggingEnabled on if you want per-scrape logs — they emit at DEBUG, so they won't flood INFO on a busy proxy.
Use logLevel = "trace" for the most verbose output; the legacy "all" level was removed and now fails fast at startup.

Shutdown

Standalone proxy/agent processes shut down cleanly on SIGTERM. Embedded agents should be stopped via EmbeddedAgentInfo.shutdown() (or close()), which blocks until terminated. See Embedded Agent.

Pre-flight checklist

[ ] TLS (ideally mutual) enabled on the gRPC channel
[ ] Agent token set, or mutual TLS in place
[ ] gRPC port reachable by agents; admin port not publicly exposed
[ ] Metrics enabled and scraped by Prometheus
[ ] Dashboards imported and alert rules loaded
[ ] maxConcurrentClients / timeouts tuned for your targets
[ ] Content-size limits sized for your largest payload
[ ] JVM heap sized for peak decompressed payload × concurrency
[ ] Liveness/readiness probes wired to /ping and /healthcheck
[ ] Consolidated mode configured where you need agent redundancy