Premature Optimization is the Root of all Evil
„premature optimization is the root of all evil.” — Sir Tony Hoare, popularized by Donald Knuth
Yes, it caught up with me as well ;)
I was playing around with Terrible (Terraform and Ansible) to deploy some SW on a libvirt environment, basically to learn stuff. But starting the domain (VM) for the first time, bootstrapping from a cloud-init ISO and updating all packages took way too long for my taste.
Being young and stupid (*ahem*), I jumped to the conclusion that the yum update was just too slow and I needed to speed it up.
First things first, you can’t see improvement, if you don’t measure. So let’s take a baseline:
terraform apply -auto-approve 1.53s user 0.42s system 0% cpu 3:52.50 total
3 minutes 53 seconds. This get’s annoying over time. So the „brilliant” idea I had was to set up a squid caching proxy in order to locally provide the rpms files needed for the updates, thus cutting down on the
yum update time.
So I started up a squid docker image, after some debugging created a squid.config that worked, opened the necessary firewall ports and modified my cloud-init configuration so that yum would use my caching proxy.
runcmd: - 'echo "proxy=http://192.168.142.143:3128" >> /etc/yum.conf' - 'echo "proxy=https://192.168.142.143:3128" >> /etc/yum.conf'Test again, first time:
terraform apply -auto-approve 1.31s user 0.33s system 0% cpu 3:54.64 total
As expected, no improvement. But now, the second time:
terraform apply -auto-approve 1.35s user 0.32s system 0% cpu 3:56.31 total
Damn! Why u so slow? After some another espresso, I noticed that yum is talking to different mirrors, which the proxy interprets as a new request and can’t use the cache for. So, next iteration. I reconfigure yum via my cloud-init in such a way, that one mirror close to me will be used for all request.
Prime the cache, and then, the second run:
terraform apply -auto-approve 1.35s user 0.29s system 0% cpu 3:55.84 total
This is getting worse the better. Why? Well now I took the time to actually take a look at
/var/log/message while the cloud-init code was running. And low and behold, the
yum update I thought was the bottleneck was a tiny fraction of the actual runtime. As I am already running on SSDs, the only change I was able to make was to enable host-passthrough for the libvirt CPU configuration and adding more CPUs (MORRR POWER !!!!!). With that I finally ended up at:
terraform apply -auto-approve 1.30s user 0.31s system 0% cpu 3:10.39 total
What actually improved performance was me adding and resetting snapshots of the domain (VM) inside my Ansible playbooks so I had quick turnaround-cycles for debugging a setup issue later in the chain.
Repeat after me: premature optimization is the root of all evil (although you learn new stuff while doing that).