Performance Tuning with Perl
If you search Google for „perl performance tuning”, you will find quite a few interesting hits. As I had to modify one of the Perl scripts I thought „what the hack, I will try to improve it’s performance too”. Every seasoned programmer will tell you, that this is a dangerous thought. Nevertheless, after I modified the script according to requirements, I took on it’s performance.
Every text I read stressed the fact, that you should test the performance of your script and of the modifications you made. So I familiarized myself with the Benchmark module and ran an initial benchmark:
283 wallclock secs (277.34 usr + 0.67 sys = 278.02 CPU) @ 0.02/s (n=5)
As the code basically reads a log-file from the disk, parses every line and writes out some stuff, I thought I could speed it up by reading the file into memory up front and iterate through that, thus decreasing disk I/O and speeding up the script. Well, the outcome of this „improvement” left me underwhelmed:
256 wallclock secs (252.03 usr + 0.00 sys = 252.03 CPU) @ 0.02/s (n=5)
Not really the improvement I was looking for. So I implemented one of the suggestions, precompiling the regular expressions I am using. As I have arrays of regexes in my script, this sounded promising. I little side-benchmark showed an improvement from 15,4 down to 4 seconds for 100.000 iterations of my little test-script. So I implemented that in my real-word example and got:
258 wallclock secs (253.52 usr + 0.00 sys = 253.52 CPU) @ 0.02/s (n=5)
OK, that was not really nice. So I tried the next tip – „Don’t modify stack”
The following sin is frequently found even in the Perl doc:
my $self = shift;
Unless you have a pertinent reason for this, use this:
my( $self, $x, $y, @z ) = @_;
So I checked my code for „shift @_” and improved that. And this is what I got:
95 wallclock secs (93.09 usr + 0.00 sys = 93.09 CPU) @ 0.05/s (n=5)
WOW. Now that is NICE. From 258 seconds down to 90. That’s what I call an improvement. Now there was one more thing to try – „pass by reference” instead of „pass by value”. As I pass around a lot of text, this actually made sense to me. So I modified all function calls to use „pass by reference” and benchmarked it:
95 wallclock secs (93.94 usr + 0.00 sys = 93.94 CPU) @ 0.05/s (n=5)
OK, that didn’t help as much as I had anticipated. But I am quite happy with the „do not modify stack” improvement. That gave a boost to my script that made a difference. I was surprised that the read ahead of the data did not gave more of a speed boost. I attribute that to the read ahead cache in the modern hard disk drives and operating systems.
My advice: Measure your „improvements”, try a few different approaches and do not rely entirely on synthetic benchmarks. If you can, try to benchmark production code.