Tradeoffs in Software Provisioning Tools

A while ago my friend Alan and I were discussing configuration management. In particular we wondered why every configuration management tool has to ship a DSL, or be loaded from YAML files.

We wondered if it would be possible to just write code that deploys servers — it might let you describe what you want to do much more precisely.

I started working on a library that lets you do this. Basically, turn every module + state combination in Ansible into a function, add any required arguments as part of the method signature, and add a Opts dictionary for any optional arguments. Right now it looks something like this.

if err := core.AddGroup(ctx, host, "wheel", core.GroupOpts{
    System: false,
    Gid: "1001",
}); err != nil {
    log.Fatal(err)
}

But starting to implement this led to several more non-obvious tradeoffs.

Abstraction

This is the most obvious reason to use a configuration management tool. Whether you are deploying OpenBSD or Ubuntu or Darwin, you still need to create users and create folders and install packages. A good provisioning tool will abstract these for you, and choose sensible defaults.

However abstractions can be leaky; maybe one filesystem offers a feature that others don't, and it can be hard to make this available while also saying "this is only supported on these systems."

Run Commands On Local Machine vs. Remote Machine

Do you want to run commands on the machine that triggered the provisioning process, or the machine being provisioned? Take mysql for example. If you have a mysql client on the local machine, you can issue commands to the remote machine using the mysql protocol on the wire.

This requires the remote machine to expose port 3306, which you might not want to do. You also need to trust mysql's ability to encrypt a connection and trust a remote certificate. Compared with SSH, mysql has had much less auditing of its security code, and is not as good of a bet for encrypting/safely compressing content going over the wire. (This becomes more salient when you have N protocols for issuing commands over the wire, instead of just SSH.)

Another option would be to SSH to the remote machine, then run a mysql client on that machine to configure/provision MySQL. But this requires that you have a MySQL client on the remote machine. It's also considerably trickier to issue multiple MySQL commands over a single SSH session, and take action based on the results.

Run Multiple Commands Per SSH Connection

A single task in Ansible for "create this recursive directory" embeds a ton of complexity. If the directory exists, but has the wrong permissions, or the wrong owner, the directories are recursively created and chowned/chmodded. You can do this with SSH commands, e.g. ssh host mkdir foo && chmod 755 foo && chown -R root:wheel foo, but it gets more and more complicated, and tougher to determine which command failed, the more commands you layer on.

You can work around this by issuing each command as part of a single SSH connection, then getting the result, and making some decision based on it. But this vastly increases the latency of what you're trying to do, even if you enable pipelining you're talking about 1 second latency per operation.

Ansible works around this by copying a Python file to the remote machine, then running that file on the remote machine with several arguments. This occurs for each directive that Ansible runs. This has two implications: each machine needs to have Python on it, and Ansible is really slow - think one second per directive you put in Ansible.

With Go, we could copy a binary to the remote host and then run it. This would let us take advantage of Go's standard libraries (instead of issuing Unix commands directly via SSH). We could either compile+SCP this binary on the fly, or ship precompiled binaries for each architecture as part of the distribution.

But if we are going to go to that length, why not just add tools to compile the user's entire program, SCP that to the remote filesystem, and run it there?

Run Multiple Directives Per SSH Connection

The only way you are going to get really fast execution is by executing multiple directives/tasks/modules as part of a single SSH connection to the host. The way to achieve the most benefits would be to compile the user's entire configuration program, SCP the binary to the host, then run the binary on the host.

But this requires giving up some flexibility as well. Some Ansible tasks involve copying data from the remote machine to a local machine - for example, mysql_db in target mode. You can do this over the SSH connection, but it might be tricky to separate output that's part of control flow - e.g. "RUN: add group wheel" - from output that's supposed to be copied to the local machine — e.g. a mysql dump. Alternatively, if you need to copy a file from the local machine to the remote machine, you need to bundle that file as part of the target you SCP to the remote machine.

For Go specifically, you'd either need a Go binary on the remote machine, and then copy all of the source files, or you'd need to cross compile the source on the local machine, which means things like user.Current() won't work.

Conclusion

There are a few thorny problems that weren't immediately apparent when I started working on this. For the moment I'm going to try to proceed with the Go solution, porting over an existing set of Ansible directives, and I'm going to try to prioritize speed of remote execution.

But I'm much less confident is going to work well, without a lot of effort.

Liked what you read? I am available for hire.

Leave a Reply

Your email address will not be published. Required fields are marked *

Comments are heavily moderated.