dock0 Round Two: Building lightweight VMs -- 17 Dec 2014

Building on my previous work, I’ve been working to solve some pain points for deploying and managing VMs. There were a couple big ones that I specifically targetted:

  1. Speed. It turns out that building a kernel and an Arch environment from scratch on each VM takes a long time.
  2. In-place upgrades. All upgrades were full redeployments. Coupled with the speed problem, this got pretty annoying
  3. Secure data. When all the data on the VM is generated on-site and sourced from public GitHub repos, passing secure data (think SSL keys or API keys) becomes problematic.

Thankfully, after some thought, I realized that these problems are all facets of what was, in hindsight, a poor design choice: generating the components on the VM itself. In round two, I set out to find a better route.

Static Assets

Plotting out the various things I’m building, it turns out that the vast majority of them don’t need to be generated per-VM anyways, because they don’t change between specific instances. As a first step, I started breaking out these shared components into their own repositories.

The pilot for this effort was my kernel repo. I’ve been building kernels longer than any of the other components, so it was easier to start with a process I understood well. I combined this with the work I’d done for my Archlinux Docker container, for which I’d written targit. That project required building an asset and then uploading it to GitHub as a versioned release asset, which translated really well to my new efforts for the VM.

I ended up building a pretty generic workflow for creating and uploading versioned assets, with help from build containers. Here’s the Makefile from the kernel repo:

This workflow is now a standard across all my build repos. Running make builds the Dockerfile in ./meta/ and launches the container (the launch script gives the container access to the repo directory, my SSH auth socket, and my .gitconfig). The container runs make local, which builds the asset and then pushes up the tag and asset to GitHub. The only parts of this that change from repo to repo are the build steps and the Dockerfile for the build container.

This process gives me an easy to use set of static assets, where before I was building at deploy time on each VM:

Dynamic Assets

I initially investigated dynamic asset handling with a similar mindset to the static assets. I’d just finished tuning static asset building to be quick and painless: why not apply the same wins to the per-node configs?

I ended up rejecting that idea partly because it doesn’t scale (1 GitHub asset per version per node starts getting very big very vast) and partly due to security concerns (if I’m putting sensitive data in the configs, I need to really trust my encryption since the asset will persist on GitHub).

Given that configuration would need to be generated locally due to the eventual inclusion, I decided to combine this generation in the same repo that knew how to talk to the Linode API. To do the actual config generation, I extended dock0 to support generating config bundles, because dock0 already supported the idea of stacking multiple config files and handling templates and scripts. I renamed my host_config repo to deploy_tool and adjusted it to work with this.

With that done, dock0 config config.yaml configs/<hostname>.yaml let me build a config tarball for all of an instance’s information.

Putting It All Together

Having built all of these pieces, the remaining work was to actually get them put together on a VM.

Versioned Static Artifacts

Learning from past iterations, I wanted to handle versioning from the start so that I could support upgrades of existing systems. To do this, I needed to track the version of each asset. I also wanted to make rolling back as easy as possible by tracking “releases”, or each combination of asset versions that had actually been deployed at once.

I started by trying to write this versioning logic directly into dock0 as an “install” subcommand, but I quickly realized what I was writing was its own separate library. I broke the versioning/release logic out into menagerie, whose sole job was to accept a list of artifacts and versions and manage releases on the filesystem. I then taught the dock0 install command to handle templating and invoke menagerie for the release management. The resulting structure looks like this:

I wrapped the dock0 install process in a Docker container, which was the first step in realizing my goal of in-place upgrades. With this, I could run the Docker container on an existing deployment, and it would pull down any new artifact versions it needed, rotate the existing releases up by one, and create the new release as ./releases/0 and ./latest. Now from grub I could easily revert to an older release if I broke something when upgrading.

Deploying Dynamic Bits

The remaining piece involved a bit more juggling. Teaching the deploy_tool code to deploy a Linode was pretty easy, and like before I used a StackScript to build my happy VM from the original maker disk image (though this time I’m using dock0 and menagerie, which is way faster). The remaining part is to use dock0 config locally and then get that tarball onto the server.

The problem? When the deploying-the-Linode API work finishes, it’s not ready to receive the tarball, nothing is set up yet or even finished booting. I could have done some gimmicky work by SCPing up the tarball to a known location and counting on the time for menagerie to run so I could unpack the tarball after, but I’m not a fan of counting on timing and that wouldn’t help me with later in-place upgrades. I ended up solving this using another Docker container. This container is spawned by the StackScript after it pulls down static artifacts. It runs in the foreground, starts an SSH server and a very simple HTTP server, and waits for /tmp/.done to exist. When that file exists, it unpacks /tmp/build.tar.gz to the right spot and exits. I then wrote the local configure script: it polls until the HTTP server is up, then uploads the tarball, then touchs /tmp/.done. This method resolves the timing concerns and opens the door for in-place upgrades, since I can spawn the same container and run the local configure script for an existing deployment too.

The Results

I feel pretty good about the new process, for now. It’s solved all my major concerns with the previous method, though I’m sure it will continue to need tweaking as my usage changes. I conducted some very unscientific tests and the deploy time is somewhere around 8 minutes from running ./meta/deploy.rb <hostname> locally to having the final system configured and accessible via SSH. That’s 2 to 3 times as fast as the previous method on a good day, and even better when there’s CPU contention (kernel building is hard work). That lets me focus more on deving other things inside the VMs, which was the original purpose for push-button rebuilds.

With any luck, I’ll be writing more posts soon about some of the more specific challenges I ran in to during the process, or the design choices I made inside the VM itself.