Autoscaling Containers on Kubernetes on AWS

One of the challenges I faced recently was the ability to autoscale my containers on my Kubernetes cluster. I realised I had not written yet about this concept and thought I would share how this can be done and what the pitfalls there were for me.

If you combine this concept with my previous post about autoscaling your kube cluster (https://renzedevries.wordpress.com/2017/01/10/autoscaling-your-kubernetes-cluster-on-aws/) you can create a very nice and balanced scalable deployment at lower costs.

Preparing your cluster

In my case I have used Kubernetes KOPS to create my cluster in AWS. However by default this does not install some of the add-ons we need for autoscaling our workloads like Heapster.

Heapster monitors and analyses the resource usage in our cluster. These metrics it monitors are very important to build scaling rules, it allows us for example to scale based on a cpu percentage. Heapster records these metrics and offers an API to Kubernetes so it can act based on this data.

In order to deploy heapster I used the following command:

kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/monitoring-standalone/v1.3.0.yaml

Please note that in your own kubernetes setup you might already have heapster or want to run a different version.

Optional dashboard
I also find it handy to run the Kubernetes dashboard, which you can deploy as following under KOPS:

kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/kubernetes-dashboard/v1.5.0.yaml

Deploying Workload

In order to get started I will deploy a simple workload, in this case its the command service for my robotics framework (see previous posts). This is a simple HTTP REST endpoint that takes in JSON data and passes this along to a message queue.

This is the descriptor of the deployment object for Kubernetes:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: command-svc
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: command-svc
    spec:
      containers:
      - name: command-svc
        image: ecr.com/command-svc:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        readinessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        livenessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 10
        env:
        - name: amq_host
          value: amq
        - name: SPRING_PROFILES_ACTIVE
          value: production

Readyness and Liveness
I have added a liveness and readyness probe to the container, this allows Kubernetes to detect when a container is ready and if its still alive. This is important in autoscaling as otherwise you might get pods already enabled in your loadbalanced service that are not actually ready to accept work. This is because Kubernetes by default can only detect if a pod has started, not if the process in the pod is ready for accepting workloads.

These probes test if a certain condition is true and only then the pod will get added to the load balanced service. In my case I have a probe to check if the port 8080 of my rest service is available. I am using a simple TCP probe as the HTTP probe that is also offered gave strange errors and the TCP probe works just as well for my purpose.

Deploying
Now we are ready to deploy the workload and we deploy this as following:

kubectl create -f command-deployment.yaml


## Enabling Autoscaling
The next step is to enable autoscaling rules on our workload, as mentioned above we have deployed heapster which can monitor resource usage. In this case I have set some resource constraints for the pods to indicate how much CPU its allowed to consume. For the command-svc per pod we have a limit of 500m, which translates to roughly 0.5 CPU core. This means if we create a rule to scale at 80 cpu usage this is based on this limit, so it will scale 80% usage of the 0.5 CPU limit.

We can create a rule that says there is always minimum of 1 pod and a maximum of 3 and we scale-up once the cpu usage exceeds 80% of the pod limit.

kubectl autoscale deployment command-svc --cpu-percent=80 --min=1 --max=3

We can ask for information on the autoscaling with the following command and monitor the scaling changes:

kubectl get hpa -w

Creating a load

I have deployed the command-svc pod and want to simulate a load using a simple tool. For this I have simple resorted to Apache JMeter, its not a perfect tool but it works well and most important its free. I have created a simple thread group with 40 users doing 100k requests against the command-svc from my desktop.

This is the result when monitoring the autoscaler:

command-svc   Deployment/command-svc   1% / 80%   1         3         1          4m
command-svc   Deployment/command-svc   39% / 80%   1         3         1         6m
command-svc   Deployment/command-svc   130% / 80%   1         3         1         7m
command-svc   Deployment/command-svc   130% / 80%   1         3         1         7m
command-svc   Deployment/command-svc   130% / 80%   1         3         2         7m
command-svc   Deployment/command-svc   199% / 80%   1         3         2         8m
command-svc   Deployment/command-svc   183% / 80%   1         3         2         9m
command-svc   Deployment/command-svc   153% / 80%   1         3         2         10m
command-svc   Deployment/command-svc   76% / 80%   1         3         2         11m
command-svc   Deployment/command-svc   64% / 80%   1         3         2         12m
command-svc   Deployment/command-svc   67% / 80%   1         3         2         13m
command-svc   Deployment/command-svc   91% / 80%   1         3         2         14m
command-svc   Deployment/command-svc   91% / 80%   1         3         2         14m
command-svc   Deployment/command-svc   91% / 80%   1         3         3         14m
command-svc   Deployment/command-svc   130% / 80%   1         3         3         15m
command-svc   Deployment/command-svc   133% / 80%   1         3         3         16m
command-svc   Deployment/command-svc   130% / 80%   1         3         3         17m
command-svc   Deployment/command-svc   126% / 80%   1         3         3         18m
command-svc   Deployment/command-svc   118% / 80%   1         3         3         19m
command-svc   Deployment/command-svc   137% / 80%   1         3         3         20m
command-svc   Deployment/command-svc   82% / 80%   1         3         3         21m
command-svc   Deployment/command-svc   0% / 80%   1         3         3         22m
command-svc   Deployment/command-svc   0% / 80%   1         3         3         22m
command-svc   Deployment/command-svc   0% / 80%   1         3         1         22m

You can also see that it neatly scales down at the end once the load goes away again.

Pitfalls

I have noticed a few things about the autoscaling that are important to take into account:
1. The CPU percentage is based on the resource limits you define in your pods, if you don't define them it won't work as expected
2. Make sure to have readyness and liveness probes in your container else your pods might not be ready but already get hit with external requests
3. I could only have probes that use TCP for some reason in AWS, unsure why this is the case, HTTP probes failed for me with timeout exceptions.

Conclusion

I hope this post helps people get the ultimate autoscaling setup of both your workloads and also your cluster. This is a very powerfull and dynamic setup on AWS in combination with the cluster autoscaler as described in my previous post: https://renzedevries.wordpress.com/2017/01/10/autoscaling-your-kubernetes-cluster-on-aws/

Advertisements