Skip to content

Advanced Topics

Nginx Reverse Proxy

To use Nginx as a reverse proxy in front of the gRPC service, disable the transport filter on both proxy and agent:

When using Nginx as a reverse proxy, disable the transport filter
on both Proxy and Agent:

java -jar prometheus-proxy.jar --tf_disabled
java -jar prometheus-agent.jar --tf_disabled --config myconfig.conf

Or via environment variable:
TRANSPORT_FILTER_DISABLED=true

Or via config:
proxy.transportFilterDisabled = true
agent.transportFilterDisabled = true

Delayed disconnect detection

With transportFilterDisabled, agent disconnections are not immediately detected. Agent contexts on the proxy are removed after the inactivity timeout (default: 60 seconds, controlled by proxy.internal.maxAgentInactivitySecs).

Example Nginx and proxy configuration files are available in the repository:

Prometheus Federation

Scrape an existing Prometheus instance via the /federate endpoint:

agent {
  pathConfigs: [
    {
      name: "Federated Prometheus"
      path: federated_metrics
      url: "http://prometheus-server:9090/federate?match[]={__name__=~\"job:.*\"}"
    }
  ]
}

This leverages Prometheus's built-in federation support, allowing you to pull metrics from another Prometheus server through the proxy.

A complete federation config is available at: examples/federate.conf

Consolidated Mode

By default, each scrape path is owned by a single agent. If a second agent tries to register the same path, it displaces the first agent.

In consolidated mode, multiple agents can register the same path for redundancy:

agent.consolidated = true

When a scrape request arrives for a consolidated path, the proxy selects one of the available agents. If one agent disconnects, the remaining agents continue serving the path.

Use cases:

  • High availability -- multiple agents serving the same endpoints
  • Load distribution -- spread scrape load across agents
  • Rolling upgrades -- new agent registers before old one deregisters

gRPC Reflection

gRPC Reflection is enabled by default, allowing tools like grpcurl to inspect the service:

# List available gRPC services:
grpcurl -plaintext localhost:50051 list

# Output:
#   ProxyService
#   grpc.health.v1.Health
#   grpc.reflection.v1alpha.ServerReflection
# Describe the ProxyService:
grpcurl -plaintext localhost:50051 describe ProxyService

# Note: If using TLS, omit the -plaintext flag and provide cert flags.

# Disable reflection:
java -jar prometheus-proxy.jar --ref_disabled

Note

When using grpcurl with the -plaintext option, ensure the proxy is running without TLS. When TLS is enabled, provide the appropriate certificate flags.

Performance Tuning

Concurrent Scraping

Increase the number of parallel scrapes for high-throughput scenarios:

# Increase concurrent scraping capacity:
java -jar prometheus-agent.jar \
  --max_concurrent_clients 10 \
  --client_timeout_secs 30 \
  --chunk 64 \
  --config myconfig.conf

# Tune HTTP client cache:
java -jar prometheus-agent.jar \
  --max_cache_size 200 \
  --max_cache_age_mins 60 \
  --max_cache_idle_mins 20 \
  --config myconfig.conf
agent {
  http {
    maxConcurrentClients = 10         // Parallel scrape limit
    clientTimeoutSecs = 30            // HTTP client timeout

    clientCache {
      maxSize = 200                   // More cached clients
      maxAgeMins = 60                 // Longer cache lifetime
      maxIdleMins = 20               // Longer idle tolerance
      cleanupIntervalMins = 10        // Less frequent cleanup
    }
  }

  chunkContentSizeKbs = 64            // Larger chunks for big payloads
  minGzipSizeBytes = 256              // Compress more aggressively
  scrapeTimeoutSecs = 30              // More time for slow targets
}

Key Tuning Parameters

Parameter Default Guidance
maxConcurrentClients 1 Increase for many endpoints or slow targets
clientTimeoutSecs 90 Lower for fast-failing scrapes
chunkContentSizeKbs 32 Increase for large payloads to reduce chunk count
minGzipSizeBytes 512 Lower to compress more aggressively
scrapeTimeoutSecs 15 Increase for slow targets
clientCache.maxSize 100 Increase if many unique auth credentials are used

gRPC Keepalive Tuning

Fine-tune gRPC keepalive behavior for specific network environments:

// Proxy gRPC keepalive settings:
proxy.grpc {
  keepAliveTimeSecs = 7200            // Interval between PING frames
  keepAliveTimeoutSecs = 20           // Timeout for PING ack
  permitKeepAliveWithoutCalls = false
  permitKeepAliveTimeSecs = 300       // Min interval between client PINGs
  maxConnectionIdleSecs = -1          // -1 = unlimited
  maxConnectionAgeSecs = -1           // -1 = unlimited
}

// Agent gRPC keepalive settings:
agent.grpc {
  keepAliveTimeSecs = -1              // -1 = use server default
  keepAliveTimeoutSecs = 20
  keepAliveWithoutCalls = false
}

See the gRPC keepalive guide for detailed tuning advice.

Stale Agent Cleanup

The proxy periodically checks for inactive agents and evicts them:

proxy.internal {
  staleAgentCheckEnabled = true
  maxAgentInactivitySecs = 60         // Evict after 60s of inactivity
  staleAgentCheckPauseSecs = 10       // Check every 10s

  scrapeRequestTimeoutSecs = 90       // Timeout for scrape requests
}

Info

When transportFilterDisabled is true, stale agent cleanup is automatically force-enabled, regardless of the staleAgentCheckEnabled setting. This ensures leaked agent contexts are eventually cleaned up.

Zipkin Tracing

Both proxy and agent support distributed tracing via Zipkin:

proxy.internal.zipkin {
  enabled = true
  hostname = "zipkin.example.com"
  port = 9411
  path = "api/v2/spans"
  serviceName = "prometheus-proxy"
}

agent.internal.zipkin {
  enabled = true
  hostname = "zipkin.example.com"
  port = 9411
  path = "api/v2/spans"
  serviceName = "prometheus-agent"
}