Debugging Pods — kubectl debug, ephemeral, exec rehberi#
"
kubectl logsçıktı vermiyor, pod CrashLoopBackOff. Production'da distroless image, shell yok. Pod debug'ı bilmek, gece SEV2 dakikalarını ya 5 dakika ya 60 dakika yapar."
Bu rehber pod-level debugging için somut komutları, ephemeral container, distroless image debug ve yaygın senaryoları anlatır.
🔍 Adım 1: kubectl describe pod#
İncele#
- Status / Reason: CrashLoopBackOff, OOMKilled, ImagePullBackOff
- Events: scheduling, image pull, liveness probe fail
- Containers / Last State: exit code, reason
Yaygın reason'lar#
| Reason | Çözüm |
|---|---|
ImagePullBackOff | imagePullSecret? registry credential? |
CrashLoopBackOff | logs incele, exit code |
OOMKilled | Memory limit yetersiz |
Pending | Node yok, taint, resource yetersiz |
Init:Error | Init container hata |
CreateContainerConfigError | ConfigMap / Secret eksik |
📜 Adım 2: kubectl logs#
# Mevcut log
kubectl logs <POD> -n <NS>
# Önceki crash'in log'u (CrashLoopBackOff için kritik)
kubectl logs <POD> -n <NS> --previous
# Multi-container pod
kubectl logs <POD> -n <NS> -c <CONTAINER>
# Tüm container
kubectl logs <POD> -n <NS> --all-containers
# Tail + follow
kubectl logs <POD> -n <NS> -f --tail=100
# Time-based
kubectl logs <POD> -n <NS> --since=1h --since-time="2026-05-04T14:30:00Z"
🔑
--previouscrash debug'ın altın anahtarı. Crash sırasındaki son log'u verir.
🐚 Adım 3: kubectl exec (Shell varsa)#
# Shell aç
kubectl exec -it <POD> -n <NS> -- sh
# veya
kubectl exec -it <POD> -n <NS> -- bash
# Tek komut
kubectl exec <POD> -n <NS> -- ps aux
kubectl exec <POD> -n <NS> -- env
kubectl exec <POD> -n <NS> -- cat /etc/resolv.conf
Distroless / scratch image (shell yok!)#
→ kubectl debug kullan (aşağıda).
🩺 Adım 4: kubectl debug (Ephemeral Container)#
K8s 1.23+ stable. Distroless / scratch image debug için altın.
Mevcut pod'a ephemeral container#
→ netshoot image'ı (curl, dig, tcpdump, ss, nc) ile aynı pod'da shell.
--target (process namespace share)#
kubectl debug -it <POD> --image=busybox --target=<MAIN_CONTAINER>
# busybox container, ana container'ın process namespace'ini paylaşır
ps aux # ana container'ın process'lerini görür
Pod kopyası ile debug#
kubectl debug <POD> --image=nicolaka/netshoot --copy-to=<POD>-debug --share-processes
# Original pod intact, kopyası debug için
Node debug#
kubectl debug node/<NODE> -it --image=ubuntu
# Node'a privileged container açar
chroot /host
# Node'un filesystem'i
🌐 Adım 5: Network Debug#
# DNS test
kubectl exec <POD> -- nslookup kubernetes.default.svc
# Service connectivity
kubectl exec <POD> -- nc -zv <SERVICE>.<NS>.svc.cluster.local 80
# External
kubectl exec <POD> -- curl -v https://api.example.com
# tcpdump (netshoot ephemeral container)
kubectl debug -it <POD> --image=nicolaka/netshoot
tcpdump -i any -n port 80
# NetworkPolicy debug
kubectl describe networkpolicy -n <NS>
cilium policy trace --src-pod <NS>/<POD> --dst-pod <NS>/<DST>
📦 Adım 6: ConfigMap / Secret Debug#
# Pod'a inject edilmiş env vars
kubectl exec <POD> -- env | grep -i app
# ConfigMap içeriği
kubectl get configmap <CM> -n <NS> -o yaml
# Secret (base64)
kubectl get secret <SEC> -n <NS> -o jsonpath='{.data}' | jq
# Secret decode
kubectl get secret <SEC> -n <NS> -o jsonpath='{.data.password}' | base64 -d
# Mounted secret pod içinde
kubectl exec <POD> -- ls /etc/secrets
kubectl exec <POD> -- cat /etc/secrets/api-key
💾 Adım 7: Volume Debug#
# Pod'un volume'leri
kubectl describe pod <POD> | grep -A 10 Volumes
# PV / PVC durumu
kubectl get pvc -n <NS>
kubectl describe pvc <PVC> -n <NS>
# Volume içeriği (pod içinde)
kubectl exec <POD> -- ls /var/data
kubectl exec <POD> -- df -h
🔥 Adım 8: Resource / Performance#
# CPU + Memory
kubectl top pod <POD> -n <NS>
kubectl top node
# Resource limits
kubectl get pod <POD> -n <NS> -o jsonpath='{.spec.containers[*].resources}'
# Heap dump (Java)
kubectl exec <POD> -- jmap -dump:format=b,file=/tmp/heap.hprof <PID>
kubectl cp <NS>/<POD>:/tmp/heap.hprof ./heap.hprof
# MAT veya VisualVM ile aç
# Go pprof
kubectl port-forward <POD> 6060:6060
go tool pprof http://localhost:6060/debug/pprof/heap
🚨 Senaryo: CrashLoopBackOff#
# 1. Status
kubectl get pod <POD> -n <NS>
# CrashLoopBackOff (5)
# 2. Events
kubectl describe pod <POD> -n <NS>
# Last State: Terminated, Reason: Error, Exit Code: 1
# 3. Crash log (önemli!)
kubectl logs <POD> -n <NS> --previous
# Output: "Failed to connect to database: ..."
# 4. ConfigMap / Secret check
kubectl exec <POD> -- env | grep DATABASE
# DATABASE_URL belki yanlış
# 5. Network test
kubectl debug -it <POD> --image=nicolaka/netshoot
nc -zv <DB>.<NS>.svc.cluster.local 5432
# Connection refused → service yok veya NetPol blocking
🚨 Senaryo: ImagePullBackOff#
kubectl describe pod <POD> -n <NS>
# Failed to pull image "<REGISTRY>/<APP>:<TAG>": ...
# unauthorized: authentication required
# imagePullSecret check
kubectl get pod <POD> -o yaml | grep imagePullSecrets
# Secret içeriği
kubectl get secret <PULL_SECRET> -o yaml
# Manuel test
docker pull <REGISTRY>/<APP>:<TAG>
# Çalışıyor mu?
🚨 Senaryo: Pending (Schedule Failed)#
kubectl describe pod <POD> -n <NS>
# Events:
# Warning FailedScheduling ... 0/3 nodes are available:
# 3 Insufficient cpu, 3 Insufficient memory.
# Nedenler
# - Resource yetersiz → Cluster Autoscaler / Karpenter
# - Taint mismatch → tolerations ekle
# - PV yok → StorageClass + PV provision
# - NodeAffinity match yok → labels kontrol
🚨 Senaryo: Liveness Probe Fail (Crash Loop)#
kubectl describe pod <POD>
# Liveness probe failed: HTTP probe failed with statuscode: 503
# Probe config
kubectl get pod <POD> -o yaml | grep -A 10 livenessProbe
# Probe çok agresif mi?
# initialDelaySeconds çok az
# timeoutSeconds çok az
# failureThreshold çok düşük
🚫 Anti-Pattern Tablosu#
| Anti-pattern | Niye kötü | Doğru |
|---|---|---|
kubectl logs --previous ihmal | Crash sebebi kayıp | Her crash'te --previous |
| Distroless'a SSH dene | Shell yok | kubectl debug --image=netshoot |
| Production pod'a debug image build | Image rebuild + deploy | Ephemeral container |
| Liveness initialDelay 0 | Crash loop | App startup time'a göre |
| Resource limit profile yapmadan tahmin | OOM / throttle | Profile + buffer |
| Tek pod inceleyip "service down" | Replica seti unutulur | Tüm replica'ları kontrol |
| Network sorunu → "kubectl restart" | Sebep bilinmez | tcpdump + dig |
| Heap dump alıp pod restart | Dump dosyası kayıp | kubectl cp ile çek |
| Pod debug için cluster-admin | Yetki abused | RBAC ile sınırlı debug role |
📋 Pod Debug Toolkit Checklist#
[ ] netshoot image cluster'da hazır (imagePullPolicy: Always)
[ ] kubectl debug command runbook'ta
[ ] Liveness/readiness probe config dokümante
[ ] Heap dump prosedürü (Java/Go)
[ ] Log aggregation (Loki) + structured logging
[ ] ConfigMap / Secret yönetim disipline (ESO)
[ ] Resource profile baseline (her servis)
[ ] Pod debug RBAC (sadece SRE / on-call)
[ ] PodDisruptionBudget tanımlı
[ ] On-call: pod debug runbook (her tier alarm için)
📚 Referanslar#
- kubectl debug — kubernetes.io/docs/tasks/debug/debug-application/
- Ephemeral Containers — kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/
- netshoot — github.com/nicolaka/netshoot
- Troubleshooting Pods — kubernetes.io/docs/tasks/debug/debug-application/debug-pods/
Production-Checklist.mdHPA-VPA-KEDA.mdResource-Limits-Guide.md09-Networking/Network-Troubleshooting.md11-SRE/Runbook-Template.md
"Pod debug 'tool seçmek' değil — systemic flow. describe → logs --previous → exec/debug → network test. 5 dakikada root cause ya da 60 dakika tahminle geçen vardiya."