A Moment of Zen: March 2007

Sunday, March 25, 2007

GMime Test Suite

Spent my weekend rewriting GMime's test-suite to be nearly fully automated. I say "nearly" because it's not completely done yet.

I've currently got fully automated tests for streams (concat stream has its very own), iconv utility functions, address parser, rfc2047/rfc2184 header encoders/decoders, quoted-string utilities, mbox parser, gpg context, and the pgp/mime utility classes.

The stream (including concat) tests generate their own test-case data and most of the others have built-in test data, however the mbox parser and pgp tests require manually created test data (the pgp tests just need pre-generated keys).

I could probably include the pre-generated gpg pub/private keys in the tarball distribution, but the test data for the mbox parser is around 11 megs, so I'm not sure how to approach that - I suppose I could tarball it up separately and upload it somewhere and upload a new version whenever I need to change it (or add more test data?).

I was kind of surprised that the only MIME parser test-suite messages I could find out there on the internet was the bundle of messages I used to host on my personal Ximian website (many of which I think I got from jwz?), put together for Camel a number of years ago (which is what I've been using to regression test GMime as well - only, until now, it hadn't been fully automated).

Do authors of other MIME parsers out there simply not have test suites for their parsers?

On a related note, I've just released GMime 2.2.5 with a few fixes (including some fixes for bugs I found only via the new automated test suite).

Wednesday, March 21, 2007

Will it Blend?

These are, apparently, part of Novell's new Linux ad campaigne. I found them pretty amusing...

mac_pc_linux.mpg
mac_pc_linux.ogg

mac_pc_linux_2.mpg
mac_pc_linux_2.ogg

mac_pc_linux_3.mpg
mac_pc_linux_3.ogg

Although... blending your end users? I'm not so sure... ;-)

You can find more of these videos at http://www.novell.com/video

Monday, March 19, 2007

DemoLib

Started hacking on a new project this weekend... just a little something to break the monotony.

The idea of DemoLib is to make my life simpler in those rare moments when I get inspired enough to write a 3d graphics demo. Just so I don't ever again have the excuse of not having all the proper utility classes and/or framework for writing a demo, I've gone and written those foundation classes/interfaces. I now have reasonable Vertex, Vector and Camera classes... along with a "Scene" interface and a default "Demo" class which plays the scenes, trying to keep a reasonable frame rate by slowing down or dropping frames, depending on the speed of rendering on the system.

I've also implemented 2 example Scene subclasses (neither take advantage of my Camera class yet... but that's because they were written before I had implemented it... they were both also precursors to my Vertex and Vector classes, but have been mostly ported to using those classes now).

Here are some token screenshots:

It's all in c#, because, well, why not? For those interested in watching my progress (or for laughing at the 3d graphics programming noob), I've checked DemoLib into Mono's subversion repository, so... `svn co` away.

Saturday, March 10, 2007

Indenting case contents

On Friday, I decided to start implementing some customizable settings for my Smart Indent feature in MonoDevelop. I started off easy by adding the ability to choose how you'd like to indent (or not) your goto labels:

When I had first implemented Smart Indent, I did a lot of mimicking of Emacs because that was my choice in editors. Having programmed in (nay, lived in) Emacs for going on a decade, I did with goto labels what Emacs did (in C), which is to say that I put them in column 2 (one space over from fully left-justified).

However, when I was rewriting my original implementation in C# using Emacs csharp-mode, I decided that when editing C# code, I much preferred the way csharp-mode unindented the goto label by only a single level. At first I was simply going to change the behavior but I had remembered that Visual Studio 2005 had an option for this, so I decided to implement the 3 options it had instead, defaulting, of course, to the same style as csharp-mode.

Today I implemented something a little more useful, a config option for case label indentation. While I was looking at Visual Studio's C# Formatting options, I discovered they had a "Indent case contents" option, so I figured I'd have a go at implementing that as well. After getting it to mostly work, I decided to test how exactly the option was handled in some situations in Visual Studio. Turns out, the option is completely ignored - no matter what the toggle state of the option was, it always indented case contents exactly the same. After discovering this, thinking it was a rather absurd option anyway, I decided to just drop it.

Tuesday, March 6, 2007

MonoDevelop Smart Indent

Since I joined the Mono team about 2 weeks ago, I've been working on a developer suite, MonoDevelop.

Anyways, for the past week or so I've been working on implementing "Smart Indent" for the editor to make my (and, likely, your) life easier when it comes to pumping out code. The less time I have to spending formatting my code, the more code I can write and the less frustrated and distracted I get having to fix the formatting. Net result? A happier programmer and more bang for the buck, which is what .NET is all about in the first place, right? ;)

So tonight, if you update your local MonoDevelop svn repository and build the latest revision, you'll be able to test out my first (of many?) gift to you, Smart Indent. May it put a smile on your face and make your dreams come true.

For those interested in how I did it, read on...

I started out taking a look at the old code. The way it worked was that effectively every time the programmer typed <Return>, it would take a peek at the previous line or two in order to get enough context to make a decision on how much to indent the code. Unfortunately, one or two lines is rarely ever enough context to really tell what is going on, so it was really difficult to get indenting right that way. I needed more context... way more context. Ideally, we'd also be able to readjust indenting on-the-fly as the user typed if he typed something that would change the indent level for that line (think: user types a lone '}' on a line to finish off a block).

To do this The Right Way(tm), I would need to keep a parse tree handy and keep it updated as the user typed, constantly checking state to see if the current line indent level needed to be altered. So that's effectively what I did... I wrote a state machine that kept a rough parse tree handy at all times (see Extras/CSharpBinding/FormattingStrategy/CSharpIndentEngine.cs for the code). It's somewhat crude, but it works (although I'm sure it will need some tweaking here and there). In order to keep the parse tree up-to-date, I hooked it up to the editor's KeyPress() event and that's where I did my tinkering with the indent level as well (after querying the CSharpIndentEngine state).

So there you have it...

Saturday, March 3, 2007

Quick Sort

Like Merge Sort, Quick Sort promises O(n lg n) time in the average case, and, like Merge Sort, is also a recursive algorithm. However, it is important to be aware that Quick Sort is O(n²) worst case, where, ironically, the input is already mostly sorted in either direction.

The way Quick Sort works is that it first chooses a pivot point (in our case, we'll just choose the middle element) with which to partition the working sequence into two partitions. Next, it puts all items with values smaller than the pivot value into the first partition and all items with values greater than the pivot value into the second partition. Recursively repeat the process on each partition.

Now that we understand how things are supposed to work, lets implement our QuickSort function (keeping our same call signature from previous sorting posts):

#define MID(lo, hi) (lo + ((hi - lo) >> 1))

#define SWAP(a, b) tmp = array[a], array[a] = array[b], array[b] = tmp

static void
QuickSortR (int *array, size_t lo, size_t hi)
{
    register size_t i, k;
    register int pivot;
    int tmp;
    
    /* make sure we have at least 2 elements */
    if ((hi - lo) < 1)
        return;
    
    /* cache our pivot value */
    pivot = array[MID (lo, hi)];
    
    i = lo;
    k = hi;
    
    do {
        /* find the first element with a value >= pivot value */
        while (i < k && array[i] < pivot)
            i++;
        
        /* find the last element with a value <= pivot value */
        while (k > i && array[k] > pivot)
            k--;
        
        if (i <= k) {
            SWAP (i, k);
            i++;
            k--;
        } else {
            break;
        }
    } while (1);
    
    if (lo < k) {
        /* recursively sort the first partition */
        QuickSortR (array, lo, k);
    }
    
    if (i < hi) {
        /* recursively sort the second partition */
        QuickSortR (array, i, hi);
    }
}

void
QuickSort (int *array, int n)
{
    QuickSortR (array, 0, n - 1);
}

Before we go any further, lets see how this compares to our Merge Sort implementation from earlier.

Since I no longer have the same machine that I timed Merge Sort with, I had to re-time it on my laptop (which is where I've been doing this since I can hack while laying back on my comfortable couch). My laptop is an IBM T40 ThinkPad which sports an Intel Pentium M Processor rated at 1600 MHz.

Since I left off sorting 10,000,000 items in the Merge Sort post, I figured I'd use the same array size here.

For Merge Sort, I'm consistently getting around 6.96s. For our Quick Sort implementation, I'm getting about 5.40s.

It would be nice if we could get rid of the function call overhead of recursion like we did with Binary Insertion Sort.

We're going to have to get crafty in order to work around the need to make our function recursive. What we can do is keep our own stack, but growing it dynamically could potentially kill any performance gains we could hope for... so, lets see what Donald Knuth has to say about the mathematical properties of this sort algorithm.

An auxiliary stack with at most [lg N] entries is needed for temporary storage.

As it happens, log₂ of the max unsigned value of any integer type is the number of bits in said integer type. This means that on my system, where integers are 32bit, I'll need a stack size of 32.

What do we store on our stack? Well, what variables do we need for QuickSortR()? We need array, high, and low. If we're going to make QuickSort() non-recursive, then that means we'll always have access to array which means the only variables we need to hold on our stack are high and low.

So here it is, your Moment of Zen:

typedef struct qstack {
    size_t lo, hi;
} qstack_t;

void
QuickSort (int *array, size_t n)
{
    qstack_t stack[32], *sp;
    register size_t i, k;
    register int pivot;
    size_t lo, hi;
    int tmp;
    
    /* initialize our stack */
    sp = stack;
    sp->hi = n - 1;
    sp->lo = 0;
    sp++;
    
    do {
        /* pop our lo and hi indexes off the stack */
        sp--;
        lo = sp->lo;
        hi = sp->hi;
        
        if ((hi - lo) < 1)
            continue;
        
        /* cache our pivot value */
        pivot = array[MID (lo, hi)];
        
        i = lo;
        k = hi;
        
        do {
            /* find the first element with a value >= pivot value */
            while (i < k && array[i] < pivot)
                i++;
            
            /* find the last element with a value <= pivot value */
            while (k > i && array[k] > pivot)
                k--;
            
            if (i <= k) {
                SWAP (i, k);
                i++;
                k--;
            } else {
                break;
            }
        } while (1);
        
        if (lo < k) {
            /* push the first partition onto our stack */
            sp->lo = lo;
            sp->hi = k;
            sp++;
        }
        
        if (i < hi) {
            /* push the second partition onto our stack */
            sp->lo = i;
            sp->hi = hi;
            sp++;
        }
    } while (sp > stack);
}

Once again, we see a slight improvement in execution time: 5.10s. Didn't seem to help all that much, but was worth giving it a shot. Might be more beneficial on non-x86 platforms, though...

As I mentioned in the opening paragraph, one of the major problems with Quick Sort is that for already sorted (or nearly sorted) inputs, Quick Sort hits its pathological corner case and becomes extremely inefficient compared to other sorts. One of the more popular solutions to this problem (as used by most libc qsort() implementations) is to fall back to Insertion Sort once an input segment is small enough. I'll leave this up to the reader to consider ways of implementing this approach.

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

25 Byte Sort Routine

On the topic of sorting, I came across something cool the other day in Michael Abrash's Graphics Programming Black Book that I thought I'd share: a 25-byte long integer sort routine (that is to say, the routine itself is only 25 bytes long).

Here it is, your Moment of Zen:

;---------------------------------------------------------------
; Sorts an array of ints. C-callable (small model). 25 bytes.
; void sort (int n, int a[]);
;
; Courtesy of David Stafford.
;---------------------------------------------------------------

      .model small
      .code
        public _sort

top:    mov     dx,[bx]     ; swap two adjacent ints
        xchg    dx,[bx+2]
        xchg    dx,[bx]

        cmp     dx,[bx]     ; did we put them in the right order?
        jl      top         ; no, swap them back

        inc     bx          ; go to the next integer
        inc     bx
        loop    top

_sort:  pop     dx          ; get return address (entry point)
        pop     cx          ; get count
        pop     bx          ; get pointer
        push    bx          ; restore pointer
        dec     cx          ; decrement count
        push    cx          ; save count
        push    dx          ; save return address
        jg      top         ; if cx > 0

        ret

      end

A Moment of Zen