Sitemap

Debugging a Crashed API Server in Kubernetes

4 min readMar 11, 2025
Press enter or click to view image in full size

Your cluster is broken, and your API server pod won’t come back up. You might notice that the kube-apiserver container appears for a moment and then disappears, or it might not show up at all when running docker ps or crictl ps. Logs might be difficult to access, making troubleshooting even harder.

In this guide, you’ll learn:
— How to diagnose a failing Kubernetes API server
— The most common reasons why the API server won’t start
— Step-by-step troubleshooting techniques to bring it back online

Let’s get started!

Understanding Kubernetes API Server Failures

The Kubernetes API server is the central component of the cluster — it’s how everything communicates. If it crashes, your cluster becomes unmanageable.

Common reasons for API server failures:
YAML syntax errors in the manifest
Invalid arguments in the API server command
Resource or permission issues
CrashLoopBackOff due to configuration problems

Step 1: Restart Kubelet to Speed Up Troubleshooting

Before diving into diagnostics, restart kubelet to ensure it attempts to relaunch the API server more frequently:

systemctl restart kubelet

This ensures you don’t have to wait too long between troubleshooting steps.

Step 2: Check for YAML Manifest Errors

If there’s a syntax error in the API server’s YAML manifest, the kubelet won’t be able to start it. Run:

journalctl -fu kubelet | grep apiserver

Look for one of these three common error types:

2.1-Could not process manifest file

Your /etc/kubernetes/manifests/kube-apiserver.yaml file has a syntax error.

How to fix it:

Open the manifest file:

vi /etc/kubernetes/manifests/kube-apiserver.yaml

Correct any YAML formatting errors.

YAML parsers stop at the first error they find, so if the issue persists, repeat this step to catch additional errors.

2.2- Structure or argument error

The YAML is syntactically correct, but contains invalid configurations.

How to fix it:

  • Double-check that all API server arguments and volume mounts are correct.
  • Verify that all referenced paths exist inside the container.
  • If unsure, refer to the Kubernetes API server documentation.

3. Pod Shows CrashLoopBackOff

The pod is starting and then immediately failing.

The YAML was successfully parsed
The API server started but crashed

Step 3: Find Logs for the Failing API Server

If the API server keeps crashing, the issue is likely with one or more arguments passed to kube-apiserver. The logs will help diagnose the problem.

Locate the API server’s log directory:

cd /var/log/pods
ls -ld *apiserver*

You’ll see a directory with a name like:

kube-system_kube-apiserver-controlplane_02d13ddeddf8e935ec2407132767aeaa

If multiple results appear, choose the one with the most recent timestamp.

This directory changes frequently, so repeat this step if needed.

Step 4: Read the API Server Logs

Enter the log directory

cd kube-system_kube-apiserver-controlplane_02d13ddeddf8e935ec2407132767aeaa

List available logs

ls -l

You should see a subdirectory named kube-apiserver.

Enter the subdirectory and check logs

cd kube-apiserver
ls -l

Look for .log files. The most recent log file contains the error details.

Read the latest log file

cat 1.log

What to look for in the logs:
Permission errors? Check RBAC policies and file permissions.
Missing arguments? Ensure all required flags are provided in the manifest.
Out of memory errors? Increase the API server’s memory limit.

Once you find the issue, fix it in the YAML manifest and restart the kubelet again:

systemctl restart kubelet

Common Issues and Fixes from Logs

TLS or Certificate Issues

Error Example:

x509: certificate signed by unknown authority

Fix:

Ensure the API server is using the correct CA certificate.

Check that the TLS certificate has not expired.

Run:

openssl x509 -noout -text -in /etc/kubernetes/pki/apiserver.crt

Port or Bind Address Conflicts

Error Example:

Failed to bind to port 6443: Address already in use

Fix:

Check if another process is already using port 6443:

netstat -tulnp | grep 6443

If another process is using it, stop the conflicting process or update the API server manifest to use a different port.

Missing or Incorrect ETCD Connection

Error Example:

failed to reach any etcd endpoint

Fix:

Verify that etcd is running and accessible:

systemctl status etcd

Check that the API server’s etcd configuration is correct in /etc/kubernetes/manifests/kube-apiserver.yaml.

Preventing Future API Server Failures

Keep Backups of Working Manifests

Before making changes to /etc/kubernetes/manifests/kube-apiserver.yaml, backup the working file:

cp /etc/kubernetes/manifests/kube-apiserver.yaml /etc/kubernetes/manifests/kube-apiserver.backup.yaml

Monitor API Server Logs Proactively

Use tools like Prometheus and Loki to monitor API server logs and events in real time.

Ensure Proper Configuration Management

Use GitOps tools like ArgoCD or Flux to maintain consistent API server configurations.

Conclusion

A crashed API server can be a nightmare, but now you know how to diagnose and fix it. If you follow these steps, you’ll be able to get your Kubernetes cluster back online

Having issues with API Server that is not described here? Drop a comment below!

Want more insights on DevOps, security, and automation? Don’t miss out — Follow me!

Connect with me on Linkedin!

--

--

Rafael Medeiros
Rafael Medeiros

Written by Rafael Medeiros

DevOps Engineer | CNCF Kubestronaut | 3x Azure Certified | Cloud & Security Enthusiast | Another IT professional willing to help the community

No responses yet