CKA: Certified Kubernetes Administrator Study Guide
The Certified Kubernetes Administrator (CKA) is a hands-on, performance-based exam from the Cloud Native Computing Foundation that validates your ability to install, configure, and operate production Kubernetes clusters. You solve real tasks in live clusters from a terminal within 120 minutes, so command fluency and kubectl speed matter as much as conceptual knowledge. It is aimed at administrators, DevOps and platform engineers, and SREs who manage Kubernetes day to day.
Domain 1: Cluster Architecture Installation and Configuration
- The control plane consists of kube-apiserver (the REST front end and single source of truth), kube-scheduler (assigns Pods to nodes), kube-controller-manager (runs built-in controller loops), and etcd (the backing store); cloud-controller-manager is optional for cloud integrations.
- etcd is the sole backing store for all cluster state; back it up with 'etcdctl snapshot save <path>' and restore with 'etcdctl snapshot restore', supplying --cacert, --cert, and --key plus ETCDCTL_API=3 when TLS is enabled.
- kubeadm init bootstraps a control plane: it generates PKI certs and kubeconfigs, writes static Pod manifests to /etc/kubernetes/manifests/, sets up RBAC, and deploys the kube-proxy and CoreDNS add-ons.
- After kubeadm init, copy /etc/kubernetes/admin.conf to $HOME/.kube/config and run 'chown $(id -u):$(id -g) $HOME/.kube/config' so a non-root user can run kubectl.
- kubeadm runs etcd and the control plane components as static Pods; the kubelet watches /etc/kubernetes/manifests/ and manages them directly without the API server, so editing a manifest there restarts that component.
- Worker nodes join with 'kubeadm join' using a bootstrap token plus the CA cert hash (--discovery-token-ca-cert-hash sha256:...); regenerate an expired token with 'kubeadm token create --print-join-command'.
- Upgrade order: upgrade kubeadm and run 'kubeadm upgrade apply <version>' on the first control plane node, then drain it and upgrade kubelet and kubectl, then upgrade remaining control plane nodes, then upgrade workers one at a time (kubeadm upgrade node, drain, upgrade kubelet, uncordon).
- A cluster needs a CNI-compatible network plugin such as Calico, Flannel, Cilium, or Weave Net; until one is installed CoreDNS Pods stay Pending and nodes can report NotReady.
- RBAC uses Roles and RoleBindings (namespace-scoped) and ClusterRoles and ClusterRoleBindings (cluster-scoped); a RoleBinding can reference a ClusterRole to grant its permissions scoped to just that single namespace.
- ServiceAccounts provide identity for Pods; bind them to a Role or ClusterRole with a RoleBinding or ClusterRoleBinding (e.g. 'kubectl create clusterrolebinding ... --clusterrole=cluster-admin --user=john').
- The kubelet runs as a native systemd process on every node (not in a container) because it manages the container runtime itself; restart it with 'systemctl restart kubelet'.
- Users authenticate to the API server via X.509 client certificates, bearer tokens, OpenID Connect tokens, or ServiceAccount tokens; Kubernetes has no built-in user object, so users come from certs or external identity providers.
- Approve a CertificateSigningRequest with 'kubectl certificate approve <name>'; check control-plane cert expiry with 'kubeadm certs check-expiration' and renew with 'kubeadm certs renew'.
- 'kubectl drain <node> --ignore-daemonsets --force' cordons a node and evicts its Pods for maintenance; --delete-emptydir-data is required if any Pod uses emptyDir; 'kubectl uncordon' returns it to scheduling.
Domain 2: Workloads and Scheduling
- A Deployment manages a ReplicaSet and Pod template for stateless workloads; create one imperatively with 'kubectl create deployment web-app --image=nginx:1.25 --replicas=3'.
- Deployment strategies are RollingUpdate (default, controlled by maxSurge and maxUnavailable) and Recreate (terminates all old Pods before creating new ones); update an image with 'kubectl set image deployment/frontend nginx=nginx:1.25'.
- 'kubectl rollout undo deployment/<name>' rolls back to the previous revision; view history with 'kubectl rollout history' and roll back to a specific one with --to-revision.
- resources.requests sets the guaranteed minimum CPU and memory the scheduler uses for placement; resources.limits caps usage; a Pod with all containers having equal requests and limits is QoS class Guaranteed.
- Probe types are livenessProbe (restarts a stuck container), readinessProbe (removes a Pod from Service endpoints when failing), and startupProbe (protects slow-starting containers from premature liveness restarts).
- Taints (kubectl taint) repel Pods from nodes; effects are NoSchedule (block new Pods), PreferNoSchedule (soft), and NoExecute (block new and evict existing Pods lacking a matching toleration, honoring optional tolerationSeconds).
- Control plane nodes carry the taint node-role.kubernetes.io/control-plane:NoSchedule, so DaemonSets and ordinary Pods skip them unless the Pod spec adds a matching toleration.
- nodeSelector matches node labels for simple placement; node affinity (requiredDuringSchedulingIgnoredDuringExecution or preferred...) adds richer matchExpressions logic; pod affinity and anti-affinity place Pods relative to other Pods.
- A Job runs Pods to completion; spec.completions sets how many successful completions are needed and spec.parallelism how many run at once; restartPolicy must be OnFailure or Never (never Always).
- A CronJob schedules Jobs on a cron expression; the default concurrencyPolicy is Allow (overlapping runs are permitted), while Forbid skips a new run if the previous one is still active and Replace cancels the running Job and starts a new one.
- A DaemonSet runs one Pod on every eligible node (and on new nodes automatically); StatefulSets create Pods in stable ordinal order (db-0, then db-1...), each waiting for the prior to be Running and Ready, with stable network identities and per-Pod PVCs.
- The HorizontalPodAutoscaler scales replicas using resource metrics (CPU, memory), custom metrics (custom.metrics.k8s.io), or external metrics; resource-metric scaling requires the metrics-server to be installed.
- Pod topology spread constraints (topologySpreadConstraints with topologyKey and maxSkew) distribute Pods evenly across zones or nodes to improve availability.
- ConfigMaps and Secrets externalize configuration; inject them as environment variables (env with valueFrom, or envFrom) or mount them as files via a volume; Secret values are base64-encoded, not encrypted, at rest by default.
Domain 3: Services and Networking
- Service types are ClusterIP (default, internal-only virtual IP), NodePort (static port 30000-32767 on every node, built on a ClusterIP), LoadBalancer (provisions an external cloud LB on top of NodePort), and ExternalName (CNAME alias).
- A NodePort forwards traffic from <NodeIP>:<port> to backend Pods regardless of which node runs them; create a ClusterIP imperatively with 'kubectl create service clusterip my-svc --tcp=80:8080' (port 80, targetPort 8080).
- CoreDNS (the default DNS since Kubernetes 1.13, configured by the 'coredns' ConfigMap in kube-system) resolves Services as <service>.<namespace>.svc.cluster.local; same-namespace Pods can use just the service name.
- A headless Service sets spec.clusterIP: None; DNS then returns the individual Pod IPs instead of a single virtual IP, which is how StatefulSets get stable per-Pod DNS records.
- kube-proxy runs on every node and programs the network rules (iptables or IPVS mode) that forward Service virtual-IP traffic to healthy backend Pods.
- An Ingress defines HTTP/HTTPS host- and path-based routing and TLS termination but does nothing on its own; you must deploy an Ingress controller (NGINX, Traefik, HAProxy) because Kubernetes ships none by default.
- Ingress path types are Exact, Prefix, and ImplementationSpecific; route an Ingress to the right controller via spec.ingressClassName (the older kubernetes.io/ingress.class annotation is deprecated).
- By default all Pod-to-Pod traffic is allowed; NetworkPolicies are additive deny-by-default-once-selected: as soon as a policy selects a Pod, only explicitly allowed traffic is permitted for the chosen policyTypes.
- A default deny-all ingress policy uses an empty podSelector ({}), policyTypes: ['Ingress'], and no ingress rules; NetworkPolicy peers are expressed with podSelector, namespaceSelector, and ipBlock.
- NetworkPolicy enforcement requires a CNI plugin that supports it (Calico, Cilium, Weave); plain Flannel ignores NetworkPolicy objects, so they silently have no effect.
- When a Service has no endpoints, the usual cause is that its selector labels do not match the Pod labels, or the target Pods are not Ready; diagnose with 'kubectl get endpoints <svc>' and 'kubectl get pods --show-labels'.
- Test connectivity by running 'kubectl exec' in a Pod and curling another Service's DNS name, or use 'kubectl run' with a temporary client image; verify DNS with nslookup or dig from inside a Pod.
- Each Pod gets its own routable cluster IP and all Pods can reach each other without NAT; containers within a Pod share the same network namespace and reach each other over localhost.
- An Ingress not routing traffic is commonly caused by a mismatch between the Ingress ingressClassName and the controller's class, a missing controller, or a backend Service with no ready endpoints.
Domain 4: Storage
- A PersistentVolume (PV) is a cluster-scoped piece of storage; a PersistentVolumeClaim (PVC) is a namespace-scoped request that binds to a matching PV by size, access mode, and storageClassName.
- Access modes are ReadWriteOnce (RWO, read-write by one node), ReadOnlyMany (ROX, read-only by many nodes), ReadWriteMany (RWX, read-write by many nodes), and ReadWriteOncePod (RWOP, exactly one Pod).
- Reclaim policies are Retain (keep the PV and data; it goes Released and needs manual cleanup before reuse) and Delete (remove the PV and underlying storage when the PVC is deleted); Recycle is deprecated.
- A StorageClass defines dynamic provisioning: it names a provisioner (a CSI driver or cloud plugin), parameters such as disk type, a reclaimPolicy, and a volumeBindingMode; mark one default with the storageclass.kubernetes.io/is-default-class annotation.
- A PVC referencing a storageClassName that has no provisioner and no matching pre-created PV stays Pending indefinitely until the StorageClass is added or an admin creates a satisfying PV.
- volumeBindingMode: WaitForFirstConsumer delays PV provisioning and binding until a Pod that uses the PVC is scheduled, so the volume is created in the right zone or node.
- By default deleting a StatefulSet does NOT delete its PVCs (protecting stateful data); Kubernetes 1.27+ adds persistentVolumeClaimRetentionPolicy to control deletion on scale-down or StatefulSet deletion.
- An emptyDir volume is created empty when a Pod is assigned to a node, is shared by all containers in that Pod, and is deleted when the Pod is removed from the node.
- The subPath field mounts a single file or subdirectory from a volume instead of the whole volume root, useful for mounting one key from a ConfigMap or Secret without clobbering the rest of the target directory.
- Mount a ConfigMap or Secret as files via a configMap or secret volume; use the items field with key and path to project specific keys, and apply restrictive file permissions (defaultMode) for sensitive data.
- Updates to a mounted ConfigMap or Secret propagate to the Pod's files after roughly the kubelet sync period (about one minute) plus cache delay; values consumed as environment variables do NOT update without a Pod restart.
- A TLS secret (type kubernetes.io/tls, created with 'kubectl create secret tls') holds a cert and key and is referenced by name (secretName) in Ingress TLS configuration.
- CSI (Container Storage Interface) is the modern plugin model: CSI drivers support volume snapshots and cloning and are developed and versioned independently of the Kubernetes release cycle.
- When a Pod is rescheduled, its PVCs are retained and reattached, so stateful data persists across Pod restarts and node moves as long as the access mode permits the move.
Domain 5: Troubleshooting
- CrashLoopBackOff means a container repeatedly starts then exits, and the kubelet applies an exponential back-off (capped at five minutes) before each restart; inspect with 'kubectl logs <pod> --previous' to see the crashed container's output.
- ImagePullBackOff and ErrImagePull mean the kubelet could not pull the image, so the container never started; causes include a wrong image name or tag, a missing or incorrect imagePullSecret for a private registry, or registry network problems.
- A Pod stuck Pending usually means the scheduler found no node that fits its resource requests, or a taint blocks it, or an unbound PVC is referenced; run 'kubectl describe pod <name>' and read the Events section.
- The scheduler fits Pods using requests (not limits); if a Pod's CPU or memory request exceeds the allocatable capacity of every node, it stays Pending until capacity frees up or the request is lowered.
- Use 'kubectl logs <pod> -c <container>' to read a specific container's stdout/stderr in a multi-container Pod; add --previous for a crashed instance and -f to follow.
- 'kubectl exec -it <pod> -- /bin/sh' opens an interactive shell in a running container for live debugging; for distroless containers use 'kubectl debug' with an ephemeral container instead.
- For a node showing NotReady, run 'kubectl describe node <node>' to read its conditions (Ready, MemoryPressure, DiskPressure, PIDPressure) and events, then check that the kubelet service is running and review its logs (journalctl -u kubelet).
- Cluster-level problems surface in events: 'kubectl get events -n <namespace> --sort-by=.lastTimestamp' shows scheduling, image, and mount failures across the namespace.
- A container terminated with reason OOMKilled (exit code 137) exceeded its memory limit and was killed by the kernel; the fix is to raise the memory limit or reduce the application's memory use.
- A liveness probe that restarts a slow-starting app prematurely is fixed by increasing initialDelaySeconds (or adding a startupProbe) so the probe waits past the app's startup time.
- Empty Service endpoints point at a selector/label mismatch or unready Pods; confirm the target Pods are Running and passing readiness probes and check 'kubectl get endpoints <service>'.
- Debug DNS by running nslookup or dig from inside any Pod against a Service name; verify CoreDNS Pods are Running in kube-system and that the Pod's /etc/resolv.conf points at the cluster DNS service IP.
- kubectl verbs for state: 'get' lists, 'describe' shows events and detailed status, 'logs' reads container output, and 'events' streams cluster activity; describe and logs are the two fastest first steps for almost any failure.
- Recover a node for maintenance with 'kubectl drain <node> --ignore-daemonsets --delete-emptydir-data', fix the underlying issue, then 'kubectl uncordon <node>' to allow scheduling again.
CKA exam tips
- The exam is hands-on in live clusters, not multiple choice; practice solving tasks end to end in a real terminal and budget your 120 minutes across roughly 15-20 tasks, skipping high-effort low-weight ones first.
- Set up speed aliases and context immediately: 'alias k=kubectl', enable kubectl autocompletion, and always run 'kubectl config use-context <ctx>' at the start of each task since each question may target a different cluster.
- Lean on imperative commands and 'kubectl create ... --dry-run=client -o yaml > file.yaml' to scaffold manifests fast, then edit; memorize 'kubectl explain <resource>.<field>' to recall spec structure without leaving the terminal.
- The official Kubernetes documentation (kubernetes.io/docs) is allowed during the exam in one extra browser tab; practice navigating it quickly and bookmark nothing you cannot find by search, since only the permitted domains are open.
- Always verify your work after each task with get, describe, and logs, and confirm Pods reach Running/Ready; a task that looks done but leaves a Pod Pending or a Service with no endpoints earns no points.
Study guide FAQ
Is the CKA exam multiple choice or hands-on?
It is entirely performance-based. You are given a remote terminal connected to several live Kubernetes clusters and must complete real administrative tasks (creating resources, fixing broken clusters, upgrading, backing up etcd) within 120 minutes. There are no multiple-choice questions.
What score do I need to pass and how long is the certification valid?
You need a score of 660 out of 1000 to pass. The certification is valid for two years from the date you pass, after which you must recertify to keep your CKA active.
Can I use the documentation during the exam?
Yes. You may open one additional browser tab to the allowed sites, which include kubernetes.io/docs (and its subdomains) and a few related project docs. You cannot use general web search, notes, or other resources, so practice finding YAML examples quickly within the official docs.
Which Kubernetes version does the exam use, and how should I prepare?
The CKA tracks a recent Kubernetes minor release (the CNCF updates the tested version periodically, typically a few releases behind the latest), so confirm the current version on the official curriculum before testing. Prepare by repeatedly building and breaking clusters with kubeadm, practicing every kubectl task by hand, and timing yourself so command fluency becomes automatic.