Kubernetes Observability
Die drei Säulen
- Metriken: Numerische Zeitreihen
- Logs: Event-Aufzeichnungen
- Traces: Request-Pfade
Metriken mit Prometheus
Architektur
- Pull-basierte Erfassung
- Time-Series Database
- PromQL Abfragesprache
- Service Discovery
Metriken-Typen
- Counter: Monoton steigend
- Gauge: Kann steigen/fallen
- Histogram: Verteilungen
- Summary: Quantile
Wichtige Metriken
- container_cpu_usage_seconds_total
- container_memory_usage_bytes
- kube_pod_status_phase
- kube_deployment_status_replicas
Dashboards mit Grafana
Features
- Multi-Datasource
- Dashboard Templates
- Alerting
- Annotations
Standard-Dashboards
- Kubernetes Cluster Overview
- Node Exporter
- Pod Resources
- Ingress Controller
Logs mit Loki
Architektur
- Label-basierte Indexierung
- Object Storage Backend
- LogQL Abfragesprache
- Promtail Agent
Log Collection
- Promtail
- Fluentd/Fluent Bit
- Vector
- Grafana Alloy
Tracing mit Jaeger
Konzepte
- Spans: Einzelne Operationen
- Traces: Kompletter Request
- Context Propagation
- Sampling
Integration
- OpenTelemetry SDK
- Auto-Instrumentation
- Service-to-Service Tracing
OpenTelemetry
Standard für Observability
- Unified API
- Vendor-neutral
- Auto-instrumentation
- OTLP Protocol
Komponenten
- SDK: Instrumentation
- Collector: Processing
- Exporters: Backend-Integration
Alerting
Prometheus Alertmanager
- Rule-basierte Alerts
- Routing und Grouping
- Silencing
- Notification Channels
Alert Best Practices
- Symptom-basiert
- Actionable
- Deduplizierung
- Escalation
Kubernetes-native Tools
- metrics-server: Resource Metrics
- kube-state-metrics: Object Metrics
- node-exporter: Node Metrics
Managed Observability
- Datadog
- New Relic
- Grafana Cloud
- AWS CloudWatch Container Insights
SLI/SLO Monitoring
Service Level Indicators
- Latency (p50, p95, p99)
- Error Rate
- Throughput
- Saturation
Service Level Objectives
- Target Values
- Error Budgets
- Burn Rate Alerts
CFTools Software implementiert umfassende Observability-Lösungen für Kubernetes.