Prometheus Monitoring Stack¶

Overview¶

The Prometheus monitoring stack provides comprehensive monitoring for your Kubernetes cluster with:

Prometheus - Metrics collection and alerting
Grafana - Visualization dashboards
Alertmanager - Alert routing and management
Node Exporter - Node-level metrics
Kube State Metrics - Kubernetes object metrics

Access URLs¶

After deployment, you can access the monitoring services via:

Grafana: https://grafana.skatzi.com
Username: admin
Password: admin123 (change this!)
Prometheus: https://prometheus.skatzi.com
Direct access to Prometheus query interface
Alertmanager: https://alertmanager.skatzi.com
Alert management interface

Features¶

Grafana Dashboards¶

Pre-installed dashboards include: - Kubernetes cluster overview - Node resource usage - Pod resource usage
- Persistent volume monitoring - Network policies - And many more!

Storage¶

All components use Hetzner Cloud volumes for persistence: - Prometheus: 20Gi (30 day retention) - Grafana: 5Gi (dashboards and config) - Alertmanager: 2Gi (alert history)

Monitoring Coverage¶

The stack automatically monitors: - All cluster nodes (CPU, memory, disk, network) - All pods and containers - Kubernetes API server - CoreDNS - Kubelet - etcd (if accessible) - Custom applications (via ServiceMonitors)

Adding Custom Metrics¶

To monitor your own applications, create ServiceMonitor resources:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
  - port: metrics
    path: /metrics

Alerting¶

Default alert rules are included for: - High CPU/Memory usage - Pod crash loops - Persistent volume issues - Node unavailability - And more...

Configure Alertmanager to send notifications via: - Email - Slack - PagerDuty - Webhooks

Resource Usage¶

Expected resource consumption: - Prometheus: ~1-2Gi RAM, 0.5-1 CPU - Grafana: ~256-512Mi RAM, 0.1-0.5 CPU
- Alertmanager: ~128-256Mi RAM, 0.05-0.2 CPU - Node Exporters: ~50Mi RAM per node - Kube State Metrics: ~100Mi RAM

Security¶

All services accessible via HTTPS with Let's Encrypt certificates
HTTP automatically redirects to HTTPS
Grafana admin password should be changed in production
Consider setting up RBAC for Grafana users

Troubleshooting¶

Check component status:

kubectl get pods -n monitoring
kubectl get pvc -n monitoring
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus

Verify ingress:

kubectl get httproute -n monitoring
kubectl get certificate -n gateway-system monitoring-tls