Skip to content

Prometheus Monitoring Stack

Overview

The Prometheus monitoring stack provides comprehensive monitoring for your Kubernetes cluster with:

  • Prometheus - Metrics collection and alerting
  • Grafana - Visualization dashboards
  • Alertmanager - Alert routing and management
  • Node Exporter - Node-level metrics
  • Kube State Metrics - Kubernetes object metrics

Access URLs

After deployment, you can access the monitoring services via:

Features

Grafana Dashboards

Pre-installed dashboards include: - Kubernetes cluster overview - Node resource usage - Pod resource usage
- Persistent volume monitoring - Network policies - And many more!

Storage

All components use Hetzner Cloud volumes for persistence: - Prometheus: 20Gi (30 day retention) - Grafana: 5Gi (dashboards and config) - Alertmanager: 2Gi (alert history)

Monitoring Coverage

The stack automatically monitors: - All cluster nodes (CPU, memory, disk, network) - All pods and containers - Kubernetes API server - CoreDNS - Kubelet - etcd (if accessible) - Custom applications (via ServiceMonitors)

Adding Custom Metrics

To monitor your own applications, create ServiceMonitor resources:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
  - port: metrics
    path: /metrics

Alerting

Default alert rules are included for: - High CPU/Memory usage - Pod crash loops - Persistent volume issues - Node unavailability - And more...

Configure Alertmanager to send notifications via: - Email - Slack - PagerDuty - Webhooks

Resource Usage

Expected resource consumption: - Prometheus: ~1-2Gi RAM, 0.5-1 CPU - Grafana: ~256-512Mi RAM, 0.1-0.5 CPU
- Alertmanager: ~128-256Mi RAM, 0.05-0.2 CPU - Node Exporters: ~50Mi RAM per node - Kube State Metrics: ~100Mi RAM

Security

  • All services accessible via HTTPS with Let's Encrypt certificates
  • HTTP automatically redirects to HTTPS
  • Grafana admin password should be changed in production
  • Consider setting up RBAC for Grafana users

Troubleshooting

Check component status:

kubectl get pods -n monitoring
kubectl get pvc -n monitoring
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus

Verify ingress:

kubectl get httproute -n monitoring
kubectl get certificate -n gateway-system monitoring-tls