Hetzner Cloud + MetalLB: Networking Challenges and Solutions¶
Overview¶
This document explains the networking challenges encountered when deploying MetalLB on Hetzner Cloud and the architectural decisions made to ensure reliable LoadBalancer functionality.
The Problem: Hetzner Cloud Networking Limitations¶
Floating IP Assignment Constraints¶
Hetzner Cloud has specific limitations regarding floating IP assignment and Layer 2 networking:
- Single Server Assignment: Floating IPs can only be assigned to one server at a time through the Hetzner Cloud API
- ARP/Layer 2 Limitations: Hetzner Cloud's network infrastructure has restrictions on ARP announcements from multiple servers
- BGP Not Available: Hetzner Cloud doesn't support BGP announcements for customer networks
MetalLB Mode Considerations¶
MetalLB offers two main modes for IP assignment:
Layer 2 Mode (L2)¶
- How it works: Uses ARP/NDP announcements to claim IP addresses
- Pros: Simple setup, no external dependencies
- Cons: Single point of failure, limited to one node announcing per IP
BGP Mode¶
- How it works: Uses BGP protocol to announce routes
- Pros: True load balancing, high availability
- Cons: Requires BGP-capable network infrastructure (not available on Hetzner Cloud)
Our Solution: Single Control Plane with L2 Mode¶
Architecture Decision¶
Given Hetzner Cloud's limitations, we implemented the following architecture:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Hetzner Cloud Network β
β β
β βββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ β
β β Control Plane β β Worker Nodes β β
β β (Single Node) β β β β
β β β β βββββββββββ βββββββββββ βββββββ β β
β β Primary IP: β β βWorker-1 β βWorker-2 β β ... β β β
β β 46.224.x.x β β βββββββββββ βββββββββββ βββββββ β β
β β β β β β
β β Floating IP: β β β β
β β 116.202.177.24 ββββββΌββββ MetalLB L2 Advertisement β β
β β β β β β
β βββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Configuration Details¶
1. Floating IP Assignment¶
# Terraform configuration
resource "hcloud_floating_ip_assignment" "cluster_ip_assignment" {
floating_ip_id = hcloud_floating_ip.cluster_ip.id
server_id = hcloud_server.control_plane[0].id # Single control plane
}
2. MetalLB L2Advertisement Configuration¶
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: l2-advertisement
namespace: metallb-system
spec:
ipAddressPools:
- ip-address-pool
3. IP Address Pool Configuration¶
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: ip-address-pool
namespace: metallb-system
spec:
addresses:
- 116.202.177.24/32 # Single floating IP
Why Single Control Plane?¶
Technical Constraints¶
- Hetzner Floating IP Limitation:
- Floating IPs can only be assigned to one server via Hetzner API
-
Multiple servers cannot claim the same floating IP simultaneously
-
ARP Announcement Issues:
- Hetzner's network may filter or block ARP announcements from "unauthorized" sources
-
Only the server with the assigned floating IP can reliably announce it
-
MetalLB L2 Mode Behavior:
- In L2 mode, MetalLB elects a single node to announce each IP
- This works well with Hetzner's single-server floating IP model
Avoided Issues¶
By using a single control plane approach, we avoid:
- Split-brain scenarios where multiple nodes try to claim the same IP
- ARP conflicts that could cause network instability
- Unreliable failover due to Hetzner's network filtering
- Complex BGP setup that isn't supported on Hetzner Cloud
Network Interface Configuration¶
Talos Cluster Patch Considerations¶
We experimented with various network configurations in the Talos cluster patch:
# Initially tried explicit interface configuration
machine:
network:
interfaces:
- interface: enp1s0
addresses:
- PRIMARY_IP/26 # Primary server IP
- FLOATING_IP/32 # Floating IP
routes:
- network: 0.0.0.0/0
gateway: GATEWAY_IP
Result: This caused network connectivity issues and DNS resolution problems.
Final Configuration¶
We simplified to let Hetzner Cloud handle the primary networking:
cluster:
controlPlane:
endpoint: https://PRIMARY_IP:6443 # Use primary IP for k8s API
network:
dnsDomain: cluster.local
podSubnets:
- 10.244.0.0/16
serviceSubnets:
- 10.96.0.0/12
# No machine network configuration - let Hetzner/DHCP handle it
Firewall Configuration¶
Required Ports¶
The Hetzner Cloud firewall must allow:
# LoadBalancer services (HTTP/HTTPS)
rule {
direction = "in"
protocol = "tcp"
port = "80"
source_ips = ["0.0.0.0/0"]
}
rule {
direction = "in"
protocol = "tcp"
port = "443"
source_ips = ["0.0.0.0/0"]
}
# Add other application-specific ports as needed
Limitations and Trade-offs¶
Current Limitations¶
- Single Point of Failure: The control plane is a single point of failure for LoadBalancer services
- No High Availability: LoadBalancer IPs cannot failover to other nodes automatically
- Scaling Constraints: All LoadBalancer traffic flows through one node
Potential Future Solutions¶
- Multiple Floating IPs:
- Assign different floating IPs to different services
-
Use multiple control planes, each with its own floating IP
-
External Load Balancer:
- Use Hetzner Cloud Load Balancer service
-
Route traffic to NodePort services
-
Custom CNI Solutions:
- Implement custom networking with proper ARP handling
- Requires deep understanding of Hetzner's network behavior
Testing and Validation¶
Connectivity Tests¶
-
Internal Connectivity:
-
External Connectivity:
Expected Results¶
- Before firewall fix: Connection timeout (firewall blocking)
- After firewall fix: Connection refused (service not ready) or successful connection
- Internal always works: If MetalLB is announcing correctly
Best Practices¶
Deployment Order¶
- Deploy Kubernetes cluster with single control plane
- Assign floating IP to control plane server via Terraform
- Deploy MetalLB with L2 mode configuration
- Configure Hetzner Cloud firewall rules
- Deploy applications with LoadBalancer services
Monitoring¶
Monitor the following for network health:
- MetalLB speaker logs for ARP announcements
- Service external IP assignment status
- Connectivity from both internal and external sources
Security Considerations¶
- Minimize exposed ports in Hetzner Cloud firewall
- Use TLS/HTTPS for all external services
- Consider network policies within the cluster
- Regularly audit firewall rules
GitOps Architecture with Flux¶
Component Separation Strategy¶
Based on recent deployment challenges, we implemented a two-phase component architecture to resolve CRD dependency issues:
Phase 1: MetalLB Installation Component¶
components/metallb-install/
βββ namespace.yaml # metallb-system namespace
βββ helmrepository.yaml # MetalLB Helm repository
βββ helmrelease.yaml # MetalLB Helm chart installation
βββ kustomization.yaml # Kustomize configuration
Phase 2: MetalLB Configuration Component¶
components/metallb-config/
βββ ipaddresspool.yaml # IP address pool configuration
βββ l2advertisement.yaml # L2 advertisement settings
βββ kustomization.yaml # Kustomize configuration
Flux Kustomization Structure¶
The cluster-level Flux Kustomizations ensure proper deployment ordering:
# cluster/hetzner-mgmt/metallb/kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: metallb-install
namespace: flux-system
spec:
interval: 10m0s
path: ./components/metallb-install
prune: true
sourceRef:
kind: GitRepository
name: bootstrap-cluster
healthChecks:
- apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
name: metallb
namespace: metallb-system
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: metallb-config
namespace: flux-system
spec:
interval: 10m0s
path: ./components/metallb-config
prune: true
sourceRef:
kind: GitRepository
name: bootstrap-cluster
dependsOn:
- name: metallb-install
postBuild:
substituteFrom:
- kind: ConfigMap
name: cluster-config
optional: false
Key Benefits of This Architecture¶
- CRD Dependency Resolution: Installation phase creates MetalLB CRDs before configuration phase attempts to use them
- Clean Component Separation: Each phase has distinct responsibilities and can be managed independently
- Flux Health Checks: Installation phase waits for HelmRelease readiness before proceeding
- ConfigMap Substitution: Dynamic IP configuration through Flux's
postBuild.substituteFrom - Dependency Ordering:
dependsOnensures installation completes before configuration
Resolved Issues¶
CRD Validation Errors¶
Problem: Terraform planning failed with "the server could not find the requested resource" for MetalLB CRDs.
Solution:
- Separated installation from configuration components
- Used Flux Kustomizations with proper dependsOn relationships
- Moved from kubernetes_manifest to kubectl apply via null_resource in Terraform
API Version Compatibility¶
Problem: Mismatched API versions between different MetalLB versions and Flux components.
Solution:
- Inspected actual CRDs using kubectl get crds
- Updated to correct API versions: HelmRepository v1beta2, HelmRelease v2beta2
- Used Helm chart deployment for consistent API versions
Template Variable Substitution¶
Problem: Static IP configuration in YAML files caused Git conflicts with dynamic infrastructure.
Solution:
- Used template variables like ${metallb_floating_ip} in configuration files
- Leveraged Flux's postBuild.substituteFrom with ConfigMap containing dynamic values
- Maintained git-trackable configuration while supporting dynamic IP assignment
Deployment Automation¶
Terraform Integration¶
The 2-components terraform configuration creates the necessary ConfigMap with cluster values:
# Create ConfigMap with cluster configuration
resource "kubernetes_config_map" "cluster_config" {
metadata {
name = "cluster-config"
namespace = "flux-system"
}
data = {
metallb_floating_ip = data.terraform_remote_state.bootstrap.outputs.floating_ip
cluster_name = var.cluster_name
# Add other dynamic values as needed
}
}
Template-Based Configuration¶
IP address pools use template variables for dynamic substitution:
# components/metallb-config/ipaddresspool.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: ip-address-pool
namespace: metallb-system
spec:
addresses:
- ${metallb_floating_ip}/32 # Substituted by Flux from ConfigMap
Conclusion¶
While Hetzner Cloud's networking limitations require a single control plane architecture for MetalLB L2 mode, this solution provides:
- Reliable LoadBalancer functionality
- Predictable network behavior
- Simple configuration and maintenance
- Cost-effective solution for many use cases
- Automated GitOps deployment with proper dependency management
- Template-based dynamic configuration avoiding Git conflicts
- Two-phase component architecture resolving CRD timing issues
For high-availability requirements, consider using Hetzner Cloud Load Balancer service or implementing multiple floating IPs with service-specific assignment patterns.