Monday, October 7, 2013

Optimization Tips & Tricks used by MimeKit: Part 2

In my previous blog post, I talked about optimizing the most critical loop in MimeKit's MimeParser by:

  • Extending our read buffer by an extra byte (which later became 4 extra bytes) that I could set to '\n', allowing me to do the bounds check after the loop as opposed to in the loop, saving us roughly half the instructions.
  • Unrolling the loop in order to check for 4 bytes at a time for that '\n' by using some bit twiddling hacks (for 64-bit systems, we might gain a little more performance by checking 8 bytes at a time).

After implementing both of those optimizations, the time taken for MimeKit's parser to parse nearly 15,000 messages in a ~1.2 gigabyte mbox file dropped from around 10s to about 6s on my iMac with Mono 3.2.3 (32-bit). That is a massive increase in performance.

Even after both of those optimizations, that loop is still the most critical loop in the parser and the MimeParser.ScanContent() method, which contains it, is still the most critical method of the parser.

While the loop itself was a huge chunk of the time spent in that method, the next largest offender was writing the content of the MIME part into a System.IO.MemoryStream.

MemoryStream, for those that aren't familiar with C#, is just what it sounds like it is: a stream backed by a memory buffer (in C#, this happens to be a byte array). By default, a new MemoryStream starts with a buffer of about 256 bytes. As you write more to the MemoryStream, it resizes its internal memory buffer to either the minimum size needed to hold the its existing content plus whatever number of bytes your latest Write() was called with or double the current internal buffer size, whichever is larger.

The performance problem here is that for MIME parts with large amounts of content, that buffer will be resized numerous times. Each time that buffer is resized, due to the way C# works, it will allocate a new buffer, zero the memory, and then copy the old content over to the new buffer. That's a lot of copying and creates a situation where the write operation can become exponentially worse as the internal buffer gets larger. Since MemoryStream contains a GetBuffer() method, its internal buffer really has to be a single contiguous block of memory. This means that there's little we could do to reduce overhead of zeroing the new buffer every time it resizes beyond trying to come up with a different formula for calculating the next optimal buffer size.

At first I decided to try the simple approach of using the MemoryStream constructor that allows specifying an initial capacity. By bumping up the initial capacity to 2048 bytes, things did improve, but only by a very disappointing amount. Larger initial capacities such as 4096 and 8192 bytes also made very little difference.

After brainstorming with my coworker and Mono runtime hacker, Rodrigo Kumpera, we decided that one way to solve this performance problem would be to write a custom memory-backed stream that didn't use a single contiguous block of memory, but instead used a list of non-contiguous memory blocks. When this stream needed to grow its internal memory storage, all it would need to do is allocate a new block of memory and append it to its internal list of blocks. This would allow for minimal overhead because only the new block would need to be zeroed and no data would need to be re-copied, ever. As it turns out, this approach would also allow me to limit the amount of unused memory used by the stream.

I dubbed this new memory-backed stream MimeKit.IO.MemoryBlockStream. As you can see, the implementation is pretty trivial (doesn't even require scary looking bit twiddling hacks like my previous optimization), but it made quite a difference in performance. By using this new memory stream, I was able to shave a full second off of the time needed to parse that mbox file I mentioned earlier, getting the total time spent down to 5s. That's starting to get pretty respectable, performance-wise.

As a comparison, let's compare the performance of MimeKit with what seems to be the 2 most popular C# MIME parsers out there (OpenPOP.NET and SharpMimeTools) and see how we do. I've been hyping up the performance of MimeKit a lot, so it had better live up to expectations, right? Let's see if it does.

Now, since none of the other C# MIME parsers I could find support parsing the Unix mbox file format, we'll write some test programs to parse the same message stream over and over (say, 20 thousand times) to compare MimeKit to OpenPOP.NET.

Here's the test program I wrote for OpenPOP.NET:

using System;
using System.IO;
using System.Diagnostics;
using OpenPop.Mime;

namespace OpenPopParser {
    class Program
    {
        public static void Main (string[] args)
        {
            var stream = File.OpenRead (args[0]);
            var stopwatch = new Stopwatch ();

            stopwatch.Start ();
            for (int i = 0; i < 20000; i++) {
                var message = Message.Load (stream);
                stream.Position = 0;
            }
            stopwatch.Stop ();

            Console.WriteLine ("Parsed 20,000 messages in {0}", stopwatch.Elapsed);
        }
    }
}

Here's the SharpMimeTools parser I wrote for testing:

using System;
using System.IO;
using System.Diagnostics;
using anmar.SharpMimeTools;

namespace SharpMimeParser {
    class Program
    {
        public static void Main (string[] args)
        {
            var stream = File.OpenRead (args[0]);
            var stopwatch = new Stopwatch ();

            stopwatch.Start ();
            for (int i = 0; i < 20000; i++) {
                var message = new SharpMessage (stream);
                stream.Position = 0;
            }
            stopwatch.Stop ();

            Console.WriteLine ("Parsed 20,000 messages in {0}", stopwatch.Elapsed);
        }
    }
}

And here is the test program I used for MimeKit:

using System;
using System.IO;
using System.Diagnostics;
using MimeKit;

namespace MimeKitParser {
    class Program
    {
        public static void Main (string[] args)
        {
            var stream = File.OpenRead (args[0]);
            var stopwatch = new Stopwatch ();

            stopwatch.Start ();
            for (int i = 0; i < 20000; i++) {
                var parser = new MimeParser (stream, MimeFormat.Default);
                var message = parser.ParseMessage ();
                stream.Position = 0;
            }
            stopwatch.Stop ();

            Console.WriteLine ("Parsed 20,000 messages in {0}", stopwatch.Elapsed);
        }
    }
}

Note: Unfortunately, OpenPOP.NET's message parser completely failed to parse the Star Trek message I pulled out of my test suite at random (first message in the jwz.mbox.txt file included in MimeKit's UnitTests project) due to the Base64 decoder not liking some byte or another in the stream, so I had to patch OpenPOP.NET to no-op its base64 decoder (which, if anything, should make it faster).

And here are the results running on my 2011 MacBook Air:

[fejj@localhost OpenPopParser]$ mono ./OpenPopParser.exe ~/Projects/MimeKit/startrek.msg
Parsed 20,000 messages in 00:06:26.6825190

[fejj@localhost SharpMimeParser]$ mono ./SharpMimeParser.exe ~/Projects/MimeKit/startrek.msg
Parsed 20,000 messages in 00:19:30.0402064

[fejj@localhost MimeKit]$ mono ./MimeKitParser.exe ~/Projects/MimeKit/startrek.msg
Parsed 20,000 messages in 00:00:15.6159326

Whooooosh!

Not. Even. Close.

MimeKit is nearly 25x faster than OpenPOP.NET even after making its base64 decoder a no-op and 75x faster than SharpMimeTools.

Since I've been ranting against C# MIME parsers that made heavy use of regex, let me show you just how horrible regex is for parsing messages (performance-wise). There's a C# MIME parser called MIMER that is nearly pure regex, so what better library to illustrate my point? I wrote a very similar loop to the other 2 that I listed above, so I'm not going to bother repeating it again. Instead, I'll just skip to the results:

[fejj@localhost MimerParser]$ mono ./MimerParser.exe ~/Projects/MimeKit/startrek.msg
Parsed 20,000 messages in 00:16:51.4839129

Ouch. MimeKit is roughly 65x faster than a fully regex-based MIME parser. It's actually rather pathetic that this regex parser beats SharpMimeTools.

This is why, as a developer, it's important to understand the limitations of the tools you decide to use. Regex is great for some things but it is a terrible choice for others. As Jamie Zawinski might say,

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.

Code Snippet Licensing

All code posted to this blog is licensed under the MIT/X11 license unless otherwise stated in the post itself.