„premature optimization is the root of all evil.”
— Sir Tony Hoare, popularized by Donald Knuth

Yes, it caught up with me as well ;)

I was playing around with Terrible (Terraform and Ansible) to deploy some SW on a libvirt environment, basically to learn stuff. But starting the domain (VM) for the first time, bootstrapping from a cloud-init ISO and updating all packages took way too long for my taste.
Being young and stupid (*ahem*), I jumped to the conclusion that the yum update was just too slow and I needed to speed it up.

First things first, you can’t see improvement, if you don’t measure. So let’s take a baseline:

terraform apply -auto-approve  1.53s user 0.42s system 0% cpu 3:52.50 total

3 minutes 53 seconds. This get’s annoying over time. So the „brilliant” idea I had was to set up a squid caching proxy in order to locally provide the rpms files needed for the updates, thus cutting down on the yum update time.

So I started up a squid docker image, after some debugging created a squid.config that worked, opened the necessary firewall ports and modified my cloud-init configuration so that yum would use my caching proxy.

 - 'echo "proxy=" >> /etc/yum.conf'
Test again, first time:
terraform apply -auto-approve  1.31s user 0.33s system 0% cpu 3:54.64 total

As expected, no improvement. But now, the second time:

terraform apply -auto-approve  1.35s user 0.32s system 0% cpu 3:56.31 total

Damn! Why u so slow? After some another espresso, I noticed that yum is talking to different mirrors, which the proxy interprets as a new request and can’t use the cache for. So, next iteration. I reconfigure yum via my cloud-init in such a way, that one mirror close to me will be used for all request.

Prime the cache, and then, the second run:

terraform apply -auto-approve  1.35s user 0.29s system 0% cpu 3:55.84 total

This is getting worse the better. Why? Well now I took the time to actually take a look at /var/log/message while the cloud-init code was running. And low and behold, the yum update I thought was the bottleneck was a tiny fraction of the actual runtime. As I am already running on SSDs, the only change I was able to make was to enable host-passthrough for the libvirt CPU configuration and adding more CPUs (MORRR POWER !!!!!). With that I finally ended up at:

terraform apply -auto-approve  1.30s user 0.31s system 0% cpu 3:10.39 total

What actually improved performance was me adding and resetting snapshots of the domain (VM) inside my Ansible playbooks so I had quick turnaround-cycles for debugging a setup issue later in the chain.

Repeat after me: premature optimization is the root of all evil (although you learn new stuff while doing that).

