Rotating AKS Cluster Certificates

Rafael Medeiros
3 min readMar 17, 2022

These days I received a call from a client that said they were not being able to access their AKS cluster anymore.

The problem

We have checked the “diagnose and solve problems” page and we found the following:

Certificate Auto Rotation Not Available

The client has also reported that the api server complained about invalid certificate when they tried to issue commands to the cluster via kubectl:

kubectl get pods -A
Unable to connect to the server: x509: certificate has expired or is not yet valid

So, the first problem is because the certificate auto-rotation was not enabled, which explains the second problem, where the api-server certificate has been expired and confirmed by the client. With no auto-rotation in place, the certificate remained expired.

The Solution

To fix the certificate expired issue we will need to “manually” rotate the certificates by using the following az cli command:

az aks rotate-certs -g $RESOURCE_GROUP_NAME -n $CLUSTER_NAME

Be aware that this command will take up to 30 minutes to be done. After that, the cluster will be ready to receive kubectl commands again.

If you receive a message like this:

Unable to connect to the server: x509: certificate signed by unknown authority<..>

You’ll need to re-authenticate to your cluster using the following command:

az aks get-credentials -g $RESOURCE_GROUP_NAME -n $CLUSTER_NAME --overwrite-existing

Then you’ll be able to issue commands to the cluster again :

kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-linux-XXXXXXX-vmss000004 Ready agent 51d v1.20.9

Talking about the auto-rotate feature, this one became Publicly Available very recently, which explains why such a feature was not enabled in the client’s cluster. In the following Microsoft article, you’ll see that this feature became general available for all users at the end of Feb 2022, and the client cluster has been created months before that:

To enable such a feature, the documentation says to upgrade the cluster to the latest version, which we have done, but it didn’t work. The document also instructed to re-create the cluster, because from now on, every new cluster created will have this feature available. The client decided to move forward with the existing cluster and don’t enable that feature.

After the rotation, all the certificates have been renewed and they are valid for 2 years, except for the ca itself, which has 30 years of validity:

More about certificate rotation from the oficial documentation:

That’s it for today, I hope someone can take advantage of this lesson learned to fix their own cluster issue. I’ll see you in the next story!

Update:

I was checking the client cluster after 2 days, and I could see that the upgrade really fixed and enabled the auto rotation feature:

--

--

Rafael Medeiros

DevOps Engineer | 3x Azure | CKA | Terraform Fanatic | Another IT Professional willing to help the community