Kubernetes — Pod Resources Management With HPA and VPA

5 min readOct 2, 2024

At the beginning of your Kubernetes journey, it’s common to focus primarily on deploying your pods without much thought to resource management. However, understanding how to scale resources effectively can lead to significant cost savings.

One good feature to start with is the metrics server, which allows you to check how much memory and CPU are being used by either pods or nodes:

kubectl top nodes

kubectl top pods

Pods Scaling Management

Pod scaling management involves adjusting the number of pod replicas in response to demand, ensuring optimal resource utilization.

Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) is a critical tool in Kubernetes that automatically adjusts the number of pods in a deployment based on CPU or memory utilization. By dynamically scaling the number of pods, the HPA ensures that your application can handle varying loads efficiently.

By setting up the HPA correctly, you can ensure that your application scales seamlessly to meet a certain demand without manual intervention.

If you want to follow along with this lab, you can use minikube to do so.

I deployed an app called webapp1:

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: webapp1
  template:
      labels:
        app: webapp1
    spec:
      containers:
      - image: vamin2/node-example
        name: nginx

The HPA is as follows:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  annotations:
  name: webapp1
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp1      <- The deployment that I want to scale
  minReplicas: 2
  maxReplicas: 5
  targetCPUUtilizationPercentage: 10     <-triggers a pod escalation when the CPU is higher than 10%

Here I used the following command to stress the webapp1 deployment:

ab -c 2 -n 10000000 http://10.244.0.27:3000/

So, in the first few requests, you can see only one pod:

But then, when pod 1goes beyond the threshold of 10% of cpu usage, it scales out another pod to fulfill the requests:

ArgoCD Vs Horizontal Pod Autoscaler — Known Race Condition

If you use ArgoCd, when the Horizontal Pod Autoscaler (HPA) adjusts the number of pod instances, ArgoCD quickly recognizes it as a configuration drift and restores the deployment to its original state. This discrepancy may impact the resource scaling process, potentially causing issues with scaling.

To fix that, we have to ensure that ArgoCD will ignore the replicas field:

ignoreDifferences:
    - group: apps
      kind: Deployment
      name: auth-api-service
      namespace: dev
      jsonPointers:
        - /spec/replicas

And let Horizontal pod Autoscaler does its job.

Vertical Pod Autoscaler

The Vertical Pod Autoscaler (VPA) is a tool designed to optimize the performance of Kubernetes clusters by dynamically adjusting the resource requests and limits of individual pods. It automatically scales the resource requirements of pods based on their actual usage, ensuring that each pod receives the necessary amount of CPU and memory to operate efficiently.

We can use the following modes on VPA:

Off: The Vertical Pod Autoscaler (VPA) will provide recommendations without automatically adjusting resource requirements.

Initial: VPA assigns resource requests only at the time of pod creation and does not modify them afterward.

Recreate: VPA sets resource requests during pod creation and updates existing pods by evicting and recreating them.

Auto mode: VPA automatically recreates pods based on its recommendations.

You can install VPA using the following:

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
git checkout origin/vpa-release-1.0
REGISTRY=registry.k8s.io/autoscaling TAG=1.0.0 ./hack/vpa-process-yamls.sh apply

You can check the installed vpa controller deployments:

Now, we need to install the VPA to a certain deployment, in our case, webapp1 again:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: webapp1-vpa
  namespace: default
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: webapp1 <- targeting webapp1
  updatePolicy:
    updateMode: "Off" <- monitoring only
  resourcePolicy:
    containerPolicies:
    - containerName: "nginx"
      minAllowed:
        cpu: "25m"
        memory: "25Mi"
      maxAllowed:
        cpu: "500m"
        memory: "600Mi"

In the above code, we are saying that it should recommend only.

If I get the vpa after some time applied, I can see it’s working, as it’s getting the cpu and memory of that container:

If I describe the vpa now, I can see the recommendation:

kubectl describe vpa

HPA and VPA — Potential Race Condition?

Resource Adjustments: If VPA increases the resource requests of a pod, HPA might trigger a scale-up based on the updated usage, potentially leading to a situation where HPA scales out more pods than needed.
Reconciliation Loop: Both autoscalers operate on different cycles. HPA might scale up due to spikes in usage while VPA is still adjusting the resources, which can result in temporary inefficiencies.

Best Practices:

Mode Configuration: Set VPA to mode: “off” or mode: “recommend” if you want to manually apply changes. Use mode: “auto” only if you’re confident about its behavior with HPA.
Resource Requests and Limits: Ensure that resource requests and limits are properly set, so that VPA’s adjustments don’t lead to unexpected behavior for HPA.
Monitor and Test: Continuously monitor the behavior of your workloads and adjust configurations as needed. Use tools like Prometheus and Grafana to visualize resource usage and scaling behavior.
Avoid High-Load Scenarios: In environments with high fluctuations, be cautious about using both together, as it may lead to instability.

Conclusion

By utilizing tools such as the Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA), you can dynamically adjust pod scaling based on workload demands.

However, it’s important to be mindful of potential race conditions between these autoscalers and to configure them appropriately to avoid conflicts.

Monitoring and testing your configurations will ensure that your applications run efficiently and scale as needed.

Follow me on Linkedin

Clap if you liked this content!

Kubernetes — Pod Resources Management With HPA and VPA

Pods Scaling Management

Horizontal Pod Autoscaler

ArgoCD Vs Horizontal Pod Autoscaler — Known Race Condition

Vertical Pod Autoscaler

HPA and VPA — Potential Race Condition?

Best Practices:

Conclusion

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Rafael Medeiros

No responses yet