Kubernetes Series Part 7: Scaling Your Applications with Horizontal Pod Autoscaler (HPA)

Table of Contents

Kubernetes Series Part 7: Scaling Your Applications with Horizontal Pod Autoscaler (HPA)

In our previous post, we discussed how to manage configuration data and secrets within Kubernetes. Now, let’s explore how to scale your applications dynamically using the Horizontal Pod Autoscaler (HPA).

Understanding Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler (HPA) automatically scales the number of Pods in a deployment, replica set, or stateful set based on observed CPU utilization, memory usage, or other select metrics. This ensures that your application can handle varying loads efficiently without manual intervention.

How HPA Works

HPA continuously monitors the specified metrics of your application and adjusts the number of replicas to maintain the desired performance. It uses the Kubernetes Metrics Server to collect resource utilization data and makes scaling decisions based on predefined thresholds.

Hands-on Labs

Prerequisites

Ensure that the Metrics Server is installed in your cluster. You can install it using the following command:

minikube addons enable metrics-server

Lab 1: Create a Deployment

First, create a deployment for your application. For this example, we’ll use an Nginx deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-hpa
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-hpa
  template:
    metadata:
      labels:
        app: nginx-hpa
    spec:
      containers:
      - name: nginx-hpa
        image: nginx:latest
        resources:
          requests:
            cpu: 100m
          limits:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-hpa-service
spec:
  selector:
    app: nginx-hpa
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
    name: http
  type: ClusterIP

Apply the deployment:

kubectl apply -f nginx-deployment.yaml

Lab 2: Create an HPA Resource

Create an HPA resource to scale the Nginx deployment based on CPU utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-hpa
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 10

Apply the HPA resource:

kubectl apply -f nginx-hpa.yaml

Testing HPA

To test the HPA, you can generate load on the Nginx deployment using a tool like kubectl run:

kubectl run -i --tty load-generator --image=busybox /bin/sh

Inside the load-generator Pod, run the following command to generate CPU load:

while true; do wget -q -O- http://nginx-hpa-service.default.svc.cluster.local; done

The domain nginx-hpa-service.default.svc.cluster.local is a DNS name that Kubernetes uses to allow Pods to communicate with each other within the cluster. Here’s a breakdown of the components:

nginx-hpa-service: This is the name of the Kubernetes Service you created.
default: This is the namespace where the Service is located. If you didn’t specify a namespace, it defaults to default.
svc: This indicates that the DNS name is for a Service.
cluster.local: This is the default domain for services within the cluster.

In Minikube, Kubernetes sets up an internal DNS service that resolves these names to the appropriate IP addresses of the Services. This allows Pods to find and communicate with each other using these DNS names. Monitor the HPA status:

kubectl get hpa

You should see the HPA scaling the number of replicas based on the CPU utilization.

Conclusion

The Horizontal Pod Autoscaler is a powerful feature in Kubernetes that helps you maintain optimal performance and resource utilization for your applications. By automatically adjusting the number of Pods based on real-time metrics, HPA ensures that your applications can handle varying loads efficiently.