Troubleshooting Kubernetes Networking: Diagnosing and Resolving Service Connectivity Issues

Kubernetes networking can be complex and is often a source of frustration for many administrators. This guide will help you systematically diagnose and resolve common networking issues in your Kubernetes cluster.

Understanding Kubernetes Networking Components

Before diving into troubleshooting, it’s important to understand the key components of Kubernetes networking:

Pods: Basic unit with its own IP address in a virtual network
Services: Stable endpoint to access pods, regardless of pod lifecycle
Ingress: API object that manages external access to services
Network Policies: Specifications for how groups of pods can communicate
CNI (Container Network Interface): Plugin that configures pod networks

Step-by-Step Troubleshooting Process

1. Verify Service Configuration

# List all services in the namespace
kubectl get svc -n <namespace>

# Check details of a specific service
kubectl describe svc <service-name> -n <namespace>

Check for:

Correct selector labels matching pod labels
Appropriate port configurations
Endpoints (if none, service won’t route traffic)
Service type (ClusterIP, NodePort, LoadBalancer)

2. Verify Pod-Service Connectivity

Check if the service’s selector matches the pods:

# Get selector from service
kubectl get svc <service-name> -n <namespace> -o jsonpath='{.spec.selector}'

# Find pods matching the selector
kubectl get pods -l <key>=<value> -n <namespace>

Check if the service has endpoints:

# List service endpoints
kubectl get endpoints <service-name> -n <namespace>

# If no endpoints are listed, the service isn't connected to any pods

3. Test Network Connectivity

Deploy a debugging pod to test network connectivity:

# Create a temporary debug pod
kubectl run network-debug --rm -it --image=nicolaka/netshoot -n <namespace> -- bash

# From inside the pod, test DNS resolution
nslookup <service-name>
nslookup <service-name>.<namespace>.svc.cluster.local

# Test connectivity to the service
curl <service-ip>:<port>
curl <service-name>:<port>

# Test pod-to-pod connectivity
ping <pod-ip>

4. Check Network Policies

Network policies can block traffic between pods:

# List network policies
kubectl get networkpolicy -n <namespace>

# Check details of network policies
kubectl describe networkpolicy <policy-name> -n <namespace>

If restrictive network policies are in place, ensure they allow necessary traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-app-traffic
  namespace: <namespace>
spec:
  podSelector:
    matchLabels:
      app: backend
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - port: 8080
      protocol: TCP

5. Diagnose DNS Issues

Kubernetes uses CoreDNS for service discovery:

# Check if CoreDNS pods are running
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns

# Test DNS resolution from within a pod
kubectl exec -it <pod-name> -n <namespace> -- nslookup kubernetes.default.svc.cluster.local

6. Troubleshoot Ingress Issues

If external access is not working:

# Check ingress status
kubectl get ingress -n <namespace>

# Get detailed ingress information
kubectl describe ingress <ingress-name> -n <namespace>

# Check ingress controller logs
kubectl logs -n <ingress-controller-namespace> -l app=ingress-nginx

Common Ingress issues:

Incorrect host configuration
TLS certificate problems
Ingress controller not properly deployed
Backend services not available

7. Check CNI Configuration

Issues with the CNI plugin can cause network connectivity problems:

# Check CNI plugin pods
kubectl get pods -n kube-system -l k8s-app=calico-node  # For Calico
kubectl get pods -n kube-system -l k8s-app=cilium  # For Cilium
kubectl get pods -n kube-system -l k8s-app=flannel  # For Flannel

# Check CNI plugin logs
kubectl logs -n kube-system <cni-pod-name>

8. Diagnosing LoadBalancer Service Issues

For LoadBalancer type services:

# Check service status
kubectl get svc <service-name> -n <namespace>

# Look for events related to the service
kubectl describe svc <service-name> -n <namespace>

# Check cloud provider integration logs
kubectl logs -n kube-system -l k8s-app=cloud-controller-manager

Practical Example: Resolving Service Connectivity Issues

Let’s work through a practical example where a frontend service can’t communicate with a backend service:

# Verify both services are running
kubectl get svc -n app-namespace
# NAME       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
# frontend   ClusterIP   10.96.45.157     <none>        80/TCP     1h
# backend    ClusterIP   10.96.78.202     <none>        8080/TCP   1h

# Check if backend has endpoints
kubectl get endpoints backend -n app-namespace
# NAME      ENDPOINTS   AGE
# backend   <none>      1h

# No endpoints, so check pod labels
kubectl get pods -l app=backend -n app-namespace
# No resources found

# Check what pods actually exist
kubectl get pods -n app-namespace
# NAME                        READY   STATUS    RESTARTS   AGE
# backend-7c9b4f5869-8k2vl    1/1     Running   0          1h
# frontend-5d4d7b8476-2xvz7   1/1     Running   0          1h

# Check the backend pod's labels
kubectl describe pod backend-7c9b4f5869-8k2vl -n app-namespace
# Labels:  app=backend-service  <-- Mismatch with service selector

# Update the service to match the correct labels
kubectl patch svc backend -n app-namespace -p '{"spec":{"selector":{"app":"backend-service"}}}'

# Verify endpoints are now created
kubectl get endpoints backend -n app-namespace
# NAME      ENDPOINTS          AGE
# backend   10.244.2.15:8080   1h15m

# Test connectivity from frontend to backend
kubectl exec -it frontend-5d4d7b8476-2xvz7 -n app-namespace -- curl backend:8080
# Response from backend service

Common Networking Issues and Solutions

1. Service Has No Endpoints

Issue: Service selector doesn’t match any pod labels. Solution: Ensure pod labels match service selector.

# Find the pod labels
kubectl get pod <pod-name> -n <namespace> --show-labels

# Update service selector to match pod labels
kubectl patch svc <service-name> -n <namespace> -p '{"spec":{"selector":{"app":"correct-label"}}}'

2. DNS Resolution Fails

Issue: CoreDNS is not functioning correctly. Solution: Check and fix CoreDNS deployment.

# Restart CoreDNS pods
kubectl rollout restart deployment coredns -n kube-system

# Check DNS configuration in kubelet
kubectl describe node <node-name> | grep -A 5 "Kubelet Arguments"

3. Network Policy Blocking Traffic

Issue: Overly restrictive network policies. Solution: Update network policies to allow necessary traffic.

# Create a temporary permissive policy for debugging
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-temporarily
  namespace: <namespace>
spec:
  podSelector: {}
  ingress:
  - {}
  egress:
  - {}
  policyTypes:
  - Ingress
  - Egress
EOF

4. Ingress Not Routing Traffic

Issue: Misconfigured ingress or ingress controller issues. Solution: Check and fix ingress configuration.

# Verify ingress class is correct
kubectl get ingressclass

# Update ingress to use the correct class
kubectl patch ingress <ingress-name> -n <namespace> -p '{"spec":{"ingressClassName":"nginx"}}'

5. Pod CIDR Conflicts

Issue: Pod CIDR ranges conflict across nodes. Solution: Reconfigure the CNI plugin or node network configuration.

# Check node pod CIDR allocation
kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'

Preventive Measures

Document Your Network Architecture: Keep detailed documentation of your network design, including CIDR ranges, services, and policies.
Implement Network Observability: Use tools like Cilium Hubble or Calico’s network policy logs to monitor and visualize network traffic.
Use Standard Network Troubleshooting Pods: Create a standard debug DaemonSet that includes networking tools:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: network-diagnostic
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: network-diagnostic
  template:
    metadata:
      labels:
        app: network-diagnostic
    spec:
      hostNetwork: true
      containers:
      - name: network-tools
        image: nicolaka/netshoot
        command: ["sleep", "infinity"]
        securityContext:
          privileged: true

Create Network Validation Tests: Implement automated tests that validate network connectivity between components.

By following this systematic approach to troubleshooting Kubernetes networking issues, you can effectively diagnose and resolve connectivity problems in your cluster, ensuring reliable communication between services.

Understanding Kubernetes Networking Components

Step-by-Step Troubleshooting Process

1. Verify Service Configuration

2. Verify Pod-Service Connectivity

3. Test Network Connectivity

4. Check Network Policies

5. Diagnose DNS Issues

6. Troubleshoot Ingress Issues

7. Check CNI Configuration

8. Diagnosing LoadBalancer Service Issues

Practical Example: Resolving Service Connectivity Issues

Common Networking Issues and Solutions

1. Service Has No Endpoints

2. DNS Resolution Fails

3. Network Policy Blocking Traffic

4. Ingress Not Routing Traffic

5. Pod CIDR Conflicts

Preventive Measures

Related Articles

Leave A Comment Cancel reply

Diagnosing and Resolving Pod Failures in Kubernetes

Debugging Kubernetes Storage Problems: Persistent Volume Claims, Storage Classes, and Common Issues

Manual

Support

Security

Manual Head Office