Monitoring¶
Monitor bindcar and BIND9 health, performance, and operations.
Monitoring Endpoints¶
Health Check¶
Use for: - Liveness probes - Basic uptime monitoring - Load balancer health checks
Readiness Check¶
Use for: - Readiness probes - Deployment readiness - Traffic routing decisions
Server Status¶
Returns BIND9 server statistics.
Logging¶
See Logging for detailed logging configuration.
Log Levels¶
error- Errors onlywarn- Warnings and errorsinfo- Normal operations (recommended)debug- Detailed debuggingtrace- Very verbose
Structured Logging¶
All logs are JSON format:
{
"timestamp": "2024-12-03T10:30:45Z",
"level": "info",
"message": "Zone created successfully",
"zone": "example.com"
}
Prometheus Metrics¶
bindcar exports comprehensive Prometheus metrics at /metrics:
Available Metrics¶
HTTP Request Metrics¶
bindcar_http_requests_total
- Type: Counter
- Labels: method, path, status
- Description: Total number of HTTP requests processed
bindcar_http_request_duration_seconds
- Type: Histogram
- Labels: method, path
- Buckets: 1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s
- Description: HTTP request duration in seconds
Zone Operation Metrics¶
bindcar_zone_operations_total
- Type: Counter
- Labels: operation, result
- Operations: create, delete, reload, freeze, thaw, notify, status
- Results: success, error
- Description: Total number of zone operations
bindcar_zones_managed_total
- Type: Gauge
- Description: Current number of zones managed
RNDC Command Metrics¶
bindcar_rndc_commands_total
- Type: Counter
- Labels: command, result
- Results: success, error
- Description: Total number of RNDC commands executed
bindcar_rndc_command_duration_seconds
- Type: Histogram
- Labels: command
- Buckets: 10ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s
- Description: RNDC command execution duration
Rate Limiting Metrics¶
bindcar_rate_limit_requests_total
- Type: Counter
- Labels: result
- Results: allowed, rejected
- Description: Total number of rate limit checks
Application Metrics¶
bindcar_app_info
- Type: Counter
- Labels: version
- Description: Application version information
Prometheus Configuration¶
Add bindcar to your Prometheus scrape configuration:
scrape_configs:
- job_name: 'bindcar'
static_configs:
- targets: ['bindcar:8080']
metrics_path: '/metrics'
Grafana Dashboard Example¶
Key queries for monitoring:
# Request rate
rate(bindcar_http_requests_total[5m])
# Request latency (p95)
histogram_quantile(0.95, rate(bindcar_http_request_duration_seconds_bucket[5m]))
# Error rate
sum(rate(bindcar_http_requests_total{status=~"5.."}[5m])) / sum(rate(bindcar_http_requests_total[5m]))
# Zone operations by type
rate(bindcar_zone_operations_total[5m])
# Rate limit rejections
rate(bindcar_rate_limit_requests_total{result="rejected"}[5m])
# RNDC command failures
rate(bindcar_rndc_commands_total{result="error"}[5m])
Kubernetes Monitoring¶
Probes¶
livenessProbe:
httpGet:
path: /api/v1/health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/v1/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Log Collection¶
Use a DaemonSet like Fluent Bit or Fluentd to collect JSON logs.
Alerting¶
Monitor for:
- Health endpoint failures (/api/v1/health returning non-200)
- 5xx error rates (bindcar_http_requests_total{status=~"5.."})
- RNDC command failures (bindcar_rndc_commands_total{result="error"})
- Authentication failures (401 errors)
- Rate limit rejections (bindcar_rate_limit_requests_total{result="rejected"})
- High request latency (p95 > 500ms)
Example Prometheus Alerts¶
groups:
- name: bindcar
rules:
- alert: BindcarHighErrorRate
expr: |
sum(rate(bindcar_http_requests_total{status=~"5.."}[5m]))
/ sum(rate(bindcar_http_requests_total[5m])) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate in bindcar"
description: "Error rate is {{ $value | humanizePercentage }}"
- alert: BindcarRateLimitHigh
expr: rate(bindcar_rate_limit_requests_total{result="rejected"}[5m]) > 1
for: 5m
labels:
severity: info
annotations:
summary: "High rate limit rejections"
description: "Rate limiting is rejecting {{ $value }} requests/sec"
- alert: BindcarRndcFailures
expr: rate(bindcar_rndc_commands_total{result="error"}[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "RNDC commands failing"
description: "RNDC failure rate: {{ $value }}/sec"
- alert: BindcarHighLatency
expr: |
histogram_quantile(0.95,
rate(bindcar_http_request_duration_seconds_bucket[5m])
) > 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "High request latency"
description: "p95 latency is {{ $value }}s"
Next Steps¶
- Logging - Configure logging
- Troubleshooting - Debug issues