Prometheus Monitoring Stack¶
Overview¶
The Prometheus monitoring stack provides comprehensive monitoring for your Kubernetes cluster with:
- Prometheus - Metrics collection and alerting
- Grafana - Visualization dashboards
- Alertmanager - Alert routing and management
- Node Exporter - Node-level metrics
- Kube State Metrics - Kubernetes object metrics
Access URLs¶
After deployment, you can access the monitoring services via:
- Grafana: https://grafana.skatzi.com
- Username:
admin -
Password:
admin123(change this!) -
Prometheus: https://prometheus.skatzi.com
-
Direct access to Prometheus query interface
-
Alertmanager: https://alertmanager.skatzi.com
- Alert management interface
Features¶
Grafana Dashboards¶
Pre-installed dashboards include:
- Kubernetes cluster overview
- Node resource usage
- Pod resource usage
- Persistent volume monitoring
- Network policies
- And many more!
Storage¶
All components use Hetzner Cloud volumes for persistence: - Prometheus: 20Gi (30 day retention) - Grafana: 5Gi (dashboards and config) - Alertmanager: 2Gi (alert history)
Monitoring Coverage¶
The stack automatically monitors: - All cluster nodes (CPU, memory, disk, network) - All pods and containers - Kubernetes API server - CoreDNS - Kubelet - etcd (if accessible) - Custom applications (via ServiceMonitors)
Adding Custom Metrics¶
To monitor your own applications, create ServiceMonitor resources:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app-metrics
namespace: monitoring
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: metrics
path: /metrics
Alerting¶
Default alert rules are included for: - High CPU/Memory usage - Pod crash loops - Persistent volume issues - Node unavailability - And more...
Configure Alertmanager to send notifications via: - Email - Slack - PagerDuty - Webhooks
Resource Usage¶
Expected resource consumption:
- Prometheus: ~1-2Gi RAM, 0.5-1 CPU
- Grafana: ~256-512Mi RAM, 0.1-0.5 CPU
- Alertmanager: ~128-256Mi RAM, 0.05-0.2 CPU
- Node Exporters: ~50Mi RAM per node
- Kube State Metrics: ~100Mi RAM
Security¶
- All services accessible via HTTPS with Let's Encrypt certificates
- HTTP automatically redirects to HTTPS
- Grafana admin password should be changed in production
- Consider setting up RBAC for Grafana users
Troubleshooting¶
Check component status:
kubectl get pods -n monitoring
kubectl get pvc -n monitoring
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus
Verify ingress: