This past weekend I was talking with Andreia about how Pan is built on top of GMime and takes advantage of my awesomely speedy uuencode/uudecode routines which reminded me that I had done some performance comparisons of GMime's uuencode program vs. the one in the GNU sharutils package a number of years ago.
I had compared GMime 1.90.0 (which was a pre-release of GMime 2.0) and GNU sharutils 4.2.0 and the results were pretty impressive... GMime's uuencoder was on the order of 3 times faster than the one in sharutils and produced exactly the same results.
The uudecoder and the base64 encoder/decoder were all roughly on the order of 7 times faster than those in GNU sharutils, so all around GMime outperformed GNU sharutils by quite a bit.
Anyways, re-reading my test results got me thinking that my uuencode routines could probably be optimized a bit more as they were lagging a bit behind the base64 encoder routine and there's really no reason it should be that far off.
Well, tonight I finally got off my butt and decided to take a look and figure out why. Upon scrolling down to my uuencode_step() routine, I immediately saw why:
Each loop would collect up to 3 bytes from the input and bit shift them into a 32bit 'saved' variable (which is a state variable used for incremental uuencoding an input stream). Then, if I had successfully extracted 3 bytes from the input, I would extract them out of 'saved' into 3 unsigned char variables. At this point I would then encode them into a temporary output buffer. When this output buffer ('uubuf') grew to 60 bytes, I'd flush it to the real output buffer with a memcpy().
All of this extra copying of data around adds up after a while and really begins to impact performance.
Before making any changes, I timed how long it took the original version of my uuencode_step() function to encode linux-2.6.24.tar.gz on my system. An average result over numerous runs was as follows:
[fejj@localhost ~]$ time `gmime-uuencode linux-2.6.24.tar.gz linux-2.6.24.tar.gz > /dev/null` real 0m0.470s user 0m0.412s sys 0m0.052s
After my rewrite, my new results were closer to:
[fejj@localhost ~]$ time `gmime-uuencode linux-2.6.24.tar.gz linux-2.6.24.tar.gz > /dev/null` real 0m0.291s user 0m0.252s sys 0m0.024s
For the sake of comparison, the best time I could manage to get from GNU sharutils 4.6.2 was as follows:
[fejj@localhost ~]$ time `uuencode linux-2.6.24.tar.gz linux-2.6.24.tar.gz > /dev/null` real 0m1.386s user 0m1.276s sys 0m0.092s
The new implementation of uuencode_step() in gmime/gmime-utils.c has been committed to the gmime svn module on GNOME's subversion server, revision 1216 - this change should appear in the next release of GMime which will likely be 2.2.17.
1. The system I tested this on was my Lenovo T61 laptop w/ a 7200 RPM harddrive running OpenSuSE 10.3 with various updates. The kernel was version 220.127.116.11-0.3-bigsmp.
model name : Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz
cpu MHz : 800.000
(e.g. my cpu was scaled down at the time of testing)
2. The GMime uuencode implementation uses a GMimeStreamFs for input as well as output. This stream class is a wrapper around the POSIX I/O system functions which unfortunately has a sub-optimal need to perform an lseek() before each read() or write() call in order to make sure that the underlying file descriptor is in the expected position. This is necessary because it is possible for multiple streams to re-use the same fd.
I mention this because an obvious rebuttal to GMime's superior performance might be to suspect that GMime's uuencode implementation "cheated" by using an mmap()'d input buffer where the GNU sharutils implementation might not.