Kubernetes networking can be complex and is often a source of frustration for many administrators. This guide will help you systematically diagnose and resolve common networking issues in your Kubernetes cluster.
Understanding Kubernetes Networking Components
Before diving into troubleshooting, it’s important to understand the key components of Kubernetes networking:
- Pods: Basic unit with its own IP address in a virtual network
- Services: Stable endpoint to access pods, regardless of pod lifecycle
- Ingress: API object that manages external access to services
- Network Policies: Specifications for how groups of pods can communicate
- CNI (Container Network Interface): Plugin that configures pod networks
Step-by-Step Troubleshooting Process
1. Verify Service Configuration
# List all services in the namespace
kubectl get svc -n <namespace>
# Check details of a specific service
kubectl describe svc <service-name> -n <namespace>
Check for:
- Correct selector labels matching pod labels
- Appropriate port configurations
- Endpoints (if none, service won’t route traffic)
- Service type (ClusterIP, NodePort, LoadBalancer)
2. Verify Pod-Service Connectivity
Check if the service’s selector matches the pods:
# Get selector from service
kubectl get svc <service-name> -n <namespace> -o jsonpath='{.spec.selector}'
# Find pods matching the selector
kubectl get pods -l <key>=<value> -n <namespace>
Check if the service has endpoints:
# List service endpoints
kubectl get endpoints <service-name> -n <namespace>
# If no endpoints are listed, the service isn't connected to any pods
3. Test Network Connectivity
Deploy a debugging pod to test network connectivity:
# Create a temporary debug pod
kubectl run network-debug --rm -it --image=nicolaka/netshoot -n <namespace> -- bash
# From inside the pod, test DNS resolution
nslookup <service-name>
nslookup <service-name>.<namespace>.svc.cluster.local
# Test connectivity to the service
curl <service-ip>:<port>
curl <service-name>:<port>
# Test pod-to-pod connectivity
ping <pod-ip>
4. Check Network Policies
Network policies can block traffic between pods:
# List network policies
kubectl get networkpolicy -n <namespace>
# Check details of network policies
kubectl describe networkpolicy <policy-name> -n <namespace>
If restrictive network policies are in place, ensure they allow necessary traffic:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-app-traffic
namespace: <namespace>
spec:
podSelector:
matchLabels:
app: backend
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- port: 8080
protocol: TCP
5. Diagnose DNS Issues
Kubernetes uses CoreDNS for service discovery:
# Check if CoreDNS pods are running
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns
# Test DNS resolution from within a pod
kubectl exec -it <pod-name> -n <namespace> -- nslookup kubernetes.default.svc.cluster.local
6. Troubleshoot Ingress Issues
If external access is not working:
# Check ingress status
kubectl get ingress -n <namespace>
# Get detailed ingress information
kubectl describe ingress <ingress-name> -n <namespace>
# Check ingress controller logs
kubectl logs -n <ingress-controller-namespace> -l app=ingress-nginx
Common Ingress issues:
- Incorrect host configuration
- TLS certificate problems
- Ingress controller not properly deployed
- Backend services not available
7. Check CNI Configuration
Issues with the CNI plugin can cause network connectivity problems:
# Check CNI plugin pods
kubectl get pods -n kube-system -l k8s-app=calico-node # For Calico
kubectl get pods -n kube-system -l k8s-app=cilium # For Cilium
kubectl get pods -n kube-system -l k8s-app=flannel # For Flannel
# Check CNI plugin logs
kubectl logs -n kube-system <cni-pod-name>
8. Diagnosing LoadBalancer Service Issues
For LoadBalancer type services:
# Check service status
kubectl get svc <service-name> -n <namespace>
# Look for events related to the service
kubectl describe svc <service-name> -n <namespace>
# Check cloud provider integration logs
kubectl logs -n kube-system -l k8s-app=cloud-controller-manager
Practical Example: Resolving Service Connectivity Issues
Let’s work through a practical example where a frontend service can’t communicate with a backend service:
# Verify both services are running
kubectl get svc -n app-namespace
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# frontend ClusterIP 10.96.45.157 <none> 80/TCP 1h
# backend ClusterIP 10.96.78.202 <none> 8080/TCP 1h
# Check if backend has endpoints
kubectl get endpoints backend -n app-namespace
# NAME ENDPOINTS AGE
# backend <none> 1h
# No endpoints, so check pod labels
kubectl get pods -l app=backend -n app-namespace
# No resources found
# Check what pods actually exist
kubectl get pods -n app-namespace
# NAME READY STATUS RESTARTS AGE
# backend-7c9b4f5869-8k2vl 1/1 Running 0 1h
# frontend-5d4d7b8476-2xvz7 1/1 Running 0 1h
# Check the backend pod's labels
kubectl describe pod backend-7c9b4f5869-8k2vl -n app-namespace
# Labels: app=backend-service <-- Mismatch with service selector
# Update the service to match the correct labels
kubectl patch svc backend -n app-namespace -p '{"spec":{"selector":{"app":"backend-service"}}}'
# Verify endpoints are now created
kubectl get endpoints backend -n app-namespace
# NAME ENDPOINTS AGE
# backend 10.244.2.15:8080 1h15m
# Test connectivity from frontend to backend
kubectl exec -it frontend-5d4d7b8476-2xvz7 -n app-namespace -- curl backend:8080
# Response from backend service
Common Networking Issues and Solutions
1. Service Has No Endpoints
Issue: Service selector doesn’t match any pod labels. Solution: Ensure pod labels match service selector.
# Find the pod labels
kubectl get pod <pod-name> -n <namespace> --show-labels
# Update service selector to match pod labels
kubectl patch svc <service-name> -n <namespace> -p '{"spec":{"selector":{"app":"correct-label"}}}'
2. DNS Resolution Fails
Issue: CoreDNS is not functioning correctly. Solution: Check and fix CoreDNS deployment.
# Restart CoreDNS pods
kubectl rollout restart deployment coredns -n kube-system
# Check DNS configuration in kubelet
kubectl describe node <node-name> | grep -A 5 "Kubelet Arguments"
3. Network Policy Blocking Traffic
Issue: Overly restrictive network policies. Solution: Update network policies to allow necessary traffic.
# Create a temporary permissive policy for debugging
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-temporarily
namespace: <namespace>
spec:
podSelector: {}
ingress:
- {}
egress:
- {}
policyTypes:
- Ingress
- Egress
EOF
4. Ingress Not Routing Traffic
Issue: Misconfigured ingress or ingress controller issues. Solution: Check and fix ingress configuration.
# Verify ingress class is correct
kubectl get ingressclass
# Update ingress to use the correct class
kubectl patch ingress <ingress-name> -n <namespace> -p '{"spec":{"ingressClassName":"nginx"}}'
5. Pod CIDR Conflicts
Issue: Pod CIDR ranges conflict across nodes. Solution: Reconfigure the CNI plugin or node network configuration.
# Check node pod CIDR allocation
kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'
Preventive Measures
- Document Your Network Architecture: Keep detailed documentation of your network design, including CIDR ranges, services, and policies.
- Implement Network Observability: Use tools like Cilium Hubble or Calico’s network policy logs to monitor and visualize network traffic.
- Use Standard Network Troubleshooting Pods: Create a standard debug DaemonSet that includes networking tools:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: network-diagnostic
namespace: kube-system
spec:
selector:
matchLabels:
app: network-diagnostic
template:
metadata:
labels:
app: network-diagnostic
spec:
hostNetwork: true
containers:
- name: network-tools
image: nicolaka/netshoot
command: ["sleep", "infinity"]
securityContext:
privileged: true
- Create Network Validation Tests: Implement automated tests that validate network connectivity between components.
By following this systematic approach to troubleshooting Kubernetes networking issues, you can effectively diagnose and resolve connectivity problems in your cluster, ensuring reliable communication between services.