Debugging Kubernetes Storage Problems: Persistent Volume Claims, Storage Classes, and Common Issues

Storage is a critical component in Kubernetes for stateful applications, but it can be challenging to troubleshoot when things go wrong. This guide will help you diagnose and resolve common Kubernetes storage issues.

Understanding Kubernetes Storage Components

Before diving into troubleshooting, let’s understand the key storage components in Kubernetes:

PersistentVolume (PV): A cluster resource representing storage in the cluster
PersistentVolumeClaim (PVC): A request for storage by a user
StorageClass: Defines the provisioner and parameters for dynamically provisioned PVs
Volume: A directory accessible to containers in a pod
CSI (Container Storage Interface): Standard for exposing storage systems to containers

Step-by-Step Troubleshooting Process

1. Identify the Problem PVC and Its Status

# List all PVCs in the namespace
kubectl get pvc -n <namespace>

# Get details about a specific PVC
kubectl describe pvc <pvc-name> -n <namespace>

Check for:

PVC status (Pending, Bound, Lost)
Events section for error messages
The PV that the PVC is bound to (if any)
Storage class being used

2. Check the Associated PV

# List all PVs in the cluster
kubectl get pv

# Get details about a specific PV
kubectl describe pv <pv-name>

Look for:

PV status (Available, Bound, Released, Failed)
Reclaim policy
Storage class
Access modes
Mount options
Node affinity

3. Verify StorageClass Configuration

# List all storage classes
kubectl get storageclass

# Get details about a specific storage class
kubectl describe storageclass <storageclass-name>

Check for:

Provisioner (must be running in the cluster)
Parameters specific to the provisioner
ReclaimPolicy
VolumeBindingMode

4. Check for CSI Driver Issues

If your cluster uses CSI drivers:

# List CSI drivers
kubectl get csidrivers

# Check CSI driver pods
kubectl get pods -n <csi-namespace> -l app=<csi-driver-name>

# Check CSI driver logs
kubectl logs -n <csi-namespace> <csi-driver-pod> -c <container-name>

5. Investigate Pod Volume Mount Issues

If the pod can’t mount the volume:

# Check pod status and events
kubectl describe pod <pod-name> -n <namespace>

# Check pod logs
kubectl logs <pod-name> -n <namespace>

# Check kubelet logs on the node
kubectl get pod <pod-name> -n <namespace> -o wide
# Note the node name, then check kubelet logs on that node
ssh <node>
sudo journalctl -u kubelet | grep <pv-name>

Common Storage Issues and Solutions

1. PVC Stuck in Pending State

Issue: PVC remains in Pending state and doesn’t get bound to a PV.

Diagnosis:

kubectl describe pvc <pvc-name> -n <namespace>
# Look for events that explain why it's pending

Common causes and solutions:

No matching PV available:
- For static provisioning: Create a PV with matching capacity and access modes
- For dynamic provisioning: Ensure the specified storage class exists and its provisioner is working
yamlapiVersion: v1 kind: PersistentVolume metadata: name: manual-pv spec: capacity: storage: 10Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain storageClassName: manual hostPath: path: "/mnt/data"
StorageClass doesn’t exist or has issues:
- Verify the storage class exists
- Check the provisioner is deployed correctly
# Create a standard storage class if needed kubectl apply -f - <<EOF apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: standard provisioner: kubernetes.io/aws-ebs # Change as per your environment parameters: type: gp2 reclaimPolicy: Delete volumeBindingMode: Immediate EOF
CSI driver or external provisioner issues:
- Check CSI driver logs
- Ensure cloud provider credentials are correct

2. Volume Mount Failures

Issue: Pod can’t mount volumes even though PVC is bound.

Diagnosis:

kubectl describe pod <pod-name> -n <namespace>
# Look for mount failure events

Common causes and solutions:

Filesystem issues:
- Check if the filesystem is corrupted
- Use a debug pod to mount the volume and check the filesystem:
kubectl apply -f - <<EOF apiVersion: v1 kind: Pod metadata: name: volume-debug namespace: <namespace> spec: containers: - name: debug image: busybox command: ["sleep", "3600"] volumeMounts: - name: problematic-volume mountPath: /data volumes: - name: problematic-volume persistentVolumeClaim: claimName: <pvc-name> EOF Then check the filesystem: kubectl exec -it volume-debug -n <namespace> -- sh # Inside the container ls -la /data df -h
Permission issues:
- Check file ownership and permissions
- Adjust SecurityContext for the pod:
yamlsecurityContext: runAsUser: 1000 fsGroup: 1000
Node issues:
- Check if the node has access to the storage backend
- For zone-specific volumes, ensure pods are scheduled in the correct zone:
affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - us-east-1a

3. Volume Expansion Issues

Issue: PVC resize requests not being fulfilled.

Diagnosis:

kubectl describe pvc <pvc-name> -n <namespace>
# Look for resize-related events

Common causes and solutions:

StorageClass doesn’t support volume expansion:
- Check if the storage class has allowVolumeExpansion: true
- Create a new storage class with volume expansion enabled:
yamlapiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: expandable-sc provisioner: kubernetes.io/aws-ebs parameters: type: gp2 allowVolumeExpansion: true
CSI driver doesn’t support expansion:
- Upgrade the CSI driver to a version that supports expansion
- Check CSI driver documentation for expansion support
Filesystem expansion needed:
- For some volume types, the filesystem needs to be expanded after the volume:
- Restart the pod to trigger filesystem expansion

4. Performance Issues

Issue: Storage performance is slower than expected.

Diagnosis:

# Deploy a benchmark pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: io-test
  namespace: <namespace>
spec:
  containers:
  - name: io-test
    image: nixery.dev/shell/fio/ioping
    command: ["sleep", "3600"]
    volumeMounts:
    - name: test-volume
      mountPath: /test-data
  volumes:
  - name: test-volume
    persistentVolumeClaim:
      claimName: <pvc-name>
EOF

# Run IO tests
kubectl exec -it io-test -n <namespace> -- fio --name=test --filename=/test-data/test --direct=1 --rw=randread --bs=4k --size=1G --numjobs=1 --time_based --runtime=60 --group_reporting

Common causes and solutions:

Incorrect storage class or parameters:
- Use storage classes optimized for your workload:
yamlapiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: high-performance provisioner: kubernetes.io/aws-ebs parameters: type: io1 iopsPerGB: "50"
Resource contention:
- Check for noisy neighbors
- Consider using local volumes for performance-sensitive workloads:
yamlapiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: local-storage provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer
Network bottlenecks:
- For network-attached storage, check network throughput
- Consider colocation of pods with their volumes

Practical Example: Resolving a PVC in Pending State

Let’s work through a practical example where a PVC is stuck in Pending state:

# Check PVC status
kubectl get pvc app-data -n production
# NAME       STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
# app-data   Pending                                      fast-storage   1h

# Get more details
kubectl describe pvc app-data -n production
# Events:
# Warning  ProvisioningFailed  5m (x12)  persistentvolume-controller  Failed to provision volume with StorageClass "fast-storage": StorageClass "fast-storage" not found

# Check available storage classes
kubectl get storageclass
# NAME             PROVISIONER             AGE
# standard         kubernetes.io/gce-pd    30d
# ssd              kubernetes.io/gce-pd    30d

# Create the missing storage class
kubectl apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-storage
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
reclaimPolicy: Delete
volumeBindingMode: Immediate
EOF

# Check PVC status again (should transition to Bound)
kubectl get pvc app-data -n production
# NAME       STATUS   VOLUME                   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
# app-data   Bound    pvc-12345678-1234-...    10Gi       RWO            fast-storage   1h30m

Preventive Measures

Create Storage Class Templates: Maintain documented templates for commonly used storage requirements.
Use Storage Class Validation: Validate PVC and storage class compatibility before deployment:

# Simple validation script
kubectl get pvc <pvc-name> -n <namespace> -o jsonpath='{.spec.storageClassName}' | xargs kubectl get storageclass

Monitor Storage Usage: Set up alerts for PVCs approaching capacity:

# Using Prometheus query
kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes > 0.8

Test Volume Resizing: Regularly test volume expansion capabilities.
Document Storage Requirements: Maintain documentation about storage requirements for each application.
Setup Regular Backups: Implement regular backups of persistent data:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: app-data-snapshot
spec:
  volumeSnapshotClassName: csi-snapshot-class
  source:
    persistentVolumeClaimName: app-data

By understanding the common storage issues and implementing these preventive measures, you can maintain reliable storage operations in your Kubernetes environment.

Understanding Kubernetes Storage Components

Step-by-Step Troubleshooting Process

1. Identify the Problem PVC and Its Status

2. Check the Associated PV

3. Verify StorageClass Configuration

4. Check for CSI Driver Issues

5. Investigate Pod Volume Mount Issues

Common Storage Issues and Solutions

1. PVC Stuck in Pending State

2. Volume Mount Failures

3. Volume Expansion Issues

4. Performance Issues

Practical Example: Resolving a PVC in Pending State

Preventive Measures

Related Articles

Leave A Comment Cancel reply

Troubleshooting Kubernetes Networking: Diagnosing and Resolving Service Connectivity Issues

Manual

Support

Security

Manual Head Office