Skip to content

Monitoring

Monitor bindcar and BIND9 health, performance, and operations.

Monitoring Endpoints

Health Check

curl http://localhost:8080/api/v1/health

Use for: - Liveness probes - Basic uptime monitoring - Load balancer health checks

Readiness Check

curl http://localhost:8080/api/v1/ready

Use for: - Readiness probes - Deployment readiness - Traffic routing decisions

Server Status

curl http://localhost:8080/api/v1/server/status \
  -H "Authorization: Bearer $TOKEN"

Returns BIND9 server statistics.

Logging

See Logging for detailed logging configuration.

Log Levels

  • error - Errors only
  • warn - Warnings and errors
  • info - Normal operations (recommended)
  • debug - Detailed debugging
  • trace - Very verbose

Structured Logging

All logs are JSON format:

{
  "timestamp": "2024-12-03T10:30:45Z",
  "level": "info",
  "message": "Zone created successfully",
  "zone": "example.com"
}

Prometheus Metrics

bindcar exports comprehensive Prometheus metrics at /metrics:

curl http://localhost:8080/metrics

Available Metrics

HTTP Request Metrics

bindcar_http_requests_total - Type: Counter - Labels: method, path, status - Description: Total number of HTTP requests processed

bindcar_http_request_duration_seconds - Type: Histogram - Labels: method, path - Buckets: 1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s - Description: HTTP request duration in seconds

Zone Operation Metrics

bindcar_zone_operations_total - Type: Counter - Labels: operation, result - Operations: create, delete, reload, freeze, thaw, notify, status - Results: success, error - Description: Total number of zone operations

bindcar_zones_managed_total - Type: Gauge - Description: Current number of zones managed

RNDC Command Metrics

bindcar_rndc_commands_total - Type: Counter - Labels: command, result - Results: success, error - Description: Total number of RNDC commands executed

bindcar_rndc_command_duration_seconds - Type: Histogram - Labels: command - Buckets: 10ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s - Description: RNDC command execution duration

Rate Limiting Metrics

bindcar_rate_limit_requests_total - Type: Counter - Labels: result - Results: allowed, rejected - Description: Total number of rate limit checks

Application Metrics

bindcar_app_info - Type: Counter - Labels: version - Description: Application version information

Prometheus Configuration

Add bindcar to your Prometheus scrape configuration:

scrape_configs:
  - job_name: 'bindcar'
    static_configs:
      - targets: ['bindcar:8080']
    metrics_path: '/metrics'

Grafana Dashboard Example

Key queries for monitoring:

# Request rate
rate(bindcar_http_requests_total[5m])

# Request latency (p95)
histogram_quantile(0.95, rate(bindcar_http_request_duration_seconds_bucket[5m]))

# Error rate
sum(rate(bindcar_http_requests_total{status=~"5.."}[5m])) / sum(rate(bindcar_http_requests_total[5m]))

# Zone operations by type
rate(bindcar_zone_operations_total[5m])

# Rate limit rejections
rate(bindcar_rate_limit_requests_total{result="rejected"}[5m])

# RNDC command failures
rate(bindcar_rndc_commands_total{result="error"}[5m])

Kubernetes Monitoring

Probes

livenessProbe:
  httpGet:
    path: /api/v1/health
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /api/v1/ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

Log Collection

Use a DaemonSet like Fluent Bit or Fluentd to collect JSON logs.

Alerting

Monitor for: - Health endpoint failures (/api/v1/health returning non-200) - 5xx error rates (bindcar_http_requests_total{status=~"5.."}) - RNDC command failures (bindcar_rndc_commands_total{result="error"}) - Authentication failures (401 errors) - Rate limit rejections (bindcar_rate_limit_requests_total{result="rejected"}) - High request latency (p95 > 500ms)

Example Prometheus Alerts

groups:
  - name: bindcar
    rules:
      - alert: BindcarHighErrorRate
        expr: |
          sum(rate(bindcar_http_requests_total{status=~"5.."}[5m]))
          / sum(rate(bindcar_http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate in bindcar"
          description: "Error rate is {{ $value | humanizePercentage }}"

      - alert: BindcarRateLimitHigh
        expr: rate(bindcar_rate_limit_requests_total{result="rejected"}[5m]) > 1
        for: 5m
        labels:
          severity: info
        annotations:
          summary: "High rate limit rejections"
          description: "Rate limiting is rejecting {{ $value }} requests/sec"

      - alert: BindcarRndcFailures
        expr: rate(bindcar_rndc_commands_total{result="error"}[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "RNDC commands failing"
          description: "RNDC failure rate: {{ $value }}/sec"

      - alert: BindcarHighLatency
        expr: |
          histogram_quantile(0.95,
            rate(bindcar_http_request_duration_seconds_bucket[5m])
          ) > 0.5
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High request latency"
          description: "p95 latency is {{ $value }}s"

Next Steps