Autoscaling GitLab Runners on Hetzner Cloud

Max Rosin

I use a private instance of GitLab a lot and utilize the GitLab CI for most of my private projects. To speed up the CI jobs I wanted to have more GitLab runners available on demand. From our GitLab instance at work I already knew that GitLab supports autoscaling runners with docker-machine. At work we use this in combination with our OpenStack cloud, but for my personal use I was looking for a way to deploy GitLab runners at Hetzner Cloud. Searching the internet for a while led me to this (German) blog post which sounded like a good approach. So I went ahead and started to implement it in a very similar way. Long story short... I am extremly happy with the result! I can get an infinite amount of runners in less then a minute when I need them and they are deleted if I don't need them anymore. This way I only pay when I use them... some would even say that this is pretty much the point of the cloud and on demand resources 😄

The Architecture

GitLab runners can be installed on all kinds of systems and then ask the GitLab server in regular intervals for new jobs they should execute. Docker-machine takes this approach to the next level and the jobs are not executed by the runner directly, instead it creates additional machines on demand and instructs them to run the jobs. If the additional machines idle for a while docker-machine removes them to save resources and money. Originally docker-machine was developed by Docker, but they stopped working on the project. Lucky for us GitLab forked the project and maintains it with minimal patches to ensure that we can still create autoscaling runners with it.

Docker-machine itself does not support Hetzner Cloud out of the box but someone implemented a driver for it. 🙂 We are going to use this Docker image to run docker-machine with Hetzner Cloud support. The image is based on the official runner image and adds the driver to it, which is exactly what we need.

The Runner

How and where we setup the runner is up to us. It doesn't need much resources (remember it does not run the jobs itself) so I just run it on the same server with mit GitLab instance. An easy way to get started is to use docker-compose. Feel free to use any other kind of orchestrator/automation/script to manage it.

A minimal docker-compose.yml file is:

version: '2'
services:
  hetzner-runner:
    image: mawalu/hetzner-gitlab-runner:latest
    volumes:
      - "./hetzner_config:/etc/gitlab-runner"

Let's get this up by running docker-compose up -d. Next thing we can do is to run docker-compose run hetzner-runner register and answer a few interactive prompts, I decided to use Hetzners cpx11 flavor with 2 CPUs and 2GB of memory and the ubuntu-20.04 image. Adjust this depending on the requirements of your jobs. We can now exit our temporary container and should find a config file in ./hetzner_config/config.toml which we can edit to configure the runner further.

concurrent = 50
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "Hetzner Autoscale"
  url = "https://git.example.com"
  token = "RUNNER TOKEN FROM GITLAB"
  executor = "docker+machine"
  limit = 10
  [runners.custom_build_dir]
  [runners.cache]
    Type = "s3"
    Shared = true
    [runners.cache.s3]
      ServerAddress = "OPTIONAL S3 PROVIDER"
      AccessKey = "OPTIONAL S3 CONFIG"
      SecretKey = "OPTIONAL S3 CONFIG"
      BucketName = "OPTIONAL S3 CONFIG"
  [runners.docker]
    tls_verify = false
    image = "ubuntu:20.04"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
    shm_size = 0
  [runners.machine]
    IdleCount = 0
    IdleTime = 1800
    MachineDriver = "hetzner"
    MachineName = "runner-%s"
    MachineOptions = ["hetzner-api-token=HETZNER API TOKEN", "hetzner-image=ubuntu-20.04", "hetzner-server-type=cpx11"]
  [[runners.machine.autoscaling]]
    Periods = ["* * * * * sat,sun *"]
    IdleTime = 21600

All configuration options are documented in the official docs but let's take a closer look at a few options.

  • limit = 10: The upper limit of machines created by autoscaling. We should always set this to a reasonable amount for our environment. In my case I don't want to create 100 machines by accident, so I keep it at a low ten. Though, this means if I have more than ten parallel jobs a few have to stay in a pending state for a while until a runner becomes available.
  • runners.cache: This is completely optional but if you have some S3 compatible object storage available you can configure it here. This speeds CI jobs up because the runners are able to share their cache. You can even use Minio to setup your own S3 compatible storage if you want to. If you don't have S3 storage available, don't worry, just remove this from the config.

  • /var/run/docker.sock: This is also optional! The Docker socket is usually required when we want to build Docker images with native Docker (docker build ...). The downside of this is that we bypass all security features that we gain by using Docker in the first place. So we can only do this if we trust all people running code in these runners. If you don't need the Docker socket available inside your runners, remove this from the volumes list. An alternative to mounting the socket and using docker build is to use a Docker builder which runs in userspace and does not need the Docker daemon, for example I tinkered around with kaniko in the past. It works but in my experience it takes roughly twice as long as the Docker daemon to build a new image.

  • runners.machine: That's where it gets interesting. IdleCount = 0 tells docker-machine to not create any spare machines by default. This means if there are no jobs, there won't be any machines (also no costs). Though, if we create new jobs it will take a moment until new machines are created and become available, this leads to a slower pipeline run, but Hetzner Cloud machines boot up really fast — the machines are up and running in less then ten seconds and then afterwards it takes less then a minute for docker-machine to provision the runner on it. For faster results we could increase the IdleCount, e.g. IdleCound = 5 to always have at least five runners available to run new jobs immediately (though we would have to pay five servers permanently). IdleTime = 1800 instructs docker-machine to keep idle runners around for 30 minutes before deleting them. This makes consecutive jobs faster because we don't have to wait for new machines to be provisioned.

  • runners.machine.autoscaling: We can define this multiple times to override the runners.machine settings for specific time periods. In this example we tell docker-machine to keep idle runners for 6 hours on the weekend. For me this makes sense because I mostly use the runners for personal projects during the weekend. So when I start a few jobs at the weekend I will probably start more, so there is no need to delete idle runners just after 30 minutes. If you want to use this kind of setup in a work environment you probably want other autoscaling settings. For example, a bunch of IdleCount machines and a high IdleTime during the week from morning to evening, less during night and almost none at the weekend.

The costs

What does it cost do run GitLab runners on Hetzner Cloud? Unfortunately there is no easy answer. The most important factor is the machine flavor we choose, the range goes from 0.005€/h (1 CPU, 2GB memory) to 0.095€/h (16 CPUs, 32GB memory). I always find these hourly prices hard to grasp, so let's put it another way: If we run a runner of the smallest instance for a full month we will pay roughly 3€, with the biggest flavor we pay 60€. The cpx11 flavor of our example costs 0.007€/h, which means roughly 4€ per month. In my use cases none of the runners will run for a full month and during most weekdays none will be created whatsoever, so I don't expect more than a few bucks per month. If you set IdleCount = 10 for a full month you will see a difference on the invoice though. One thing to keep in mind while configuring the IdleTime: Hetzner bills the machines hourly, so each started hour costs, no matter if the server runs 1 minute or 59 minutes. So a very small IdleTime probably doesn't make sense because we would create and delete servers more then necessary which means unnecessary costs.

Conclusion

With this fairly easy setup we now have autoscaling on demand GitLab runners on Hetzner Cloud. We only pay what we use and it works super reliable. 🚀