This past weekend I was talking with Andreia about how Pan is built on top of GMime and takes advantage of my awesomely speedy uuencode/uudecode routines which reminded me that I had done some performance comparisons of GMime's uuencode program vs. the one in the GNU sharutils package a number of years ago.
I had compared GMime 1.90.0 (which was a pre-release of GMime 2.0) and GNU sharutils 4.2.0 and the results were pretty impressive... GMime's uuencoder was on the order of 3 times faster than the one in sharutils and produced exactly the same results.
The uudecoder and the base64 encoder/decoder were all roughly on the order of 7 times faster than those in GNU sharutils, so all around GMime outperformed GNU sharutils by quite a bit.
Anyways, re-reading my test results got me thinking that my uuencode routines could probably be optimized a bit more as they were lagging a bit behind the base64 encoder routine and there's really no reason it should be that far off.
Well, tonight I finally got off my butt and decided to take a look and figure out why. Upon scrolling down to my uuencode_step() routine, I immediately saw why:
Each loop would collect up to 3 bytes from the input and bit shift them into a 32bit 'saved' variable (which is a state variable used for incremental uuencoding an input stream). Then, if I had successfully extracted 3 bytes from the input, I would extract them out of 'saved' into 3 unsigned char variables. At this point I would then encode them into a temporary output buffer. When this output buffer ('uubuf') grew to 60 bytes, I'd flush it to the real output buffer with a memcpy().
All of this extra copying of data around adds up after a while and really begins to impact performance.
Before making any changes, I timed how long it took the original version of my uuencode_step() function to encode linux-2.6.24.tar.gz on my system[1]. An average result over numerous runs was as follows:
[fejj@localhost ~]$ time `gmime-uuencode linux-2.6.24.tar.gz linux-2.6.24.tar.gz > /dev/null` real 0m0.470s user 0m0.412s sys 0m0.052s
After my rewrite, my new results were closer to:
[fejj@localhost ~]$ time `gmime-uuencode linux-2.6.24.tar.gz linux-2.6.24.tar.gz > /dev/null` real 0m0.291s user 0m0.252s sys 0m0.024s
For the sake of comparison, the best time I could manage to get from GNU sharutils 4.6.2 was as follows:
[fejj@localhost ~]$ time `uuencode linux-2.6.24.tar.gz linux-2.6.24.tar.gz > /dev/null` real 0m1.386s user 0m1.276s sys 0m0.092s
The new implementation of uuencode_step() in gmime/gmime-utils.c has been committed to the gmime svn module on GNOME's subversion server, revision 1216 - this change should appear in the next release of GMime which will likely be 2.2.17.
Notes:
1. The system I tested this on was my Lenovo T61 laptop w/ a 7200 RPM harddrive running OpenSuSE 10.3 with various updates. The kernel was version 2.6.22.13-0.3-bigsmp.
From /proc/cpuinfo:
model name : Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz
cpu MHz : 800.000
(e.g. my cpu was scaled down at the time of testing)
2. The GMime uuencode implementation uses a GMimeStreamFs for input as well as output. This stream class is a wrapper around the POSIX I/O system functions which unfortunately has a sub-optimal need to perform an lseek() before each read() or write() call in order to make sure that the underlying file descriptor is in the expected position. This is necessary because it is possible for multiple streams to re-use the same fd.
I mention this because an obvious rebuttal to GMime's superior performance might be to suspect that GMime's uuencode implementation "cheated" by using an mmap()'d input buffer where the GNU sharutils implementation might not.
3 comments:
Nice post, thanks. :)
Wow - impressive!
This might be a naive question, but do you have any plans to submit this to sharutils? I don't use them (well, unless I'm using an app that uses them), but it seems like a lot of people could benefit from your work if you were able to get them integrated.
Thanks ;-)
I'm not sure it's really worth it to me to port the code over to the GNU SharUtils package, but I'm willing to offer my code and possibly assistance to the GNU SharUtils developers should they decide they would like to use my code.
As far as I know, several Linux distros already ship GMime's uuencode/uudecode programs as the system uuencode/decode programs in place of the GNU SharUtils versions, so it might not even be necessary to do.
Post a Comment