Saturday, March 3, 2007

Quick Sort

Like Merge Sort, Quick Sort promises O(n lg n) time in the average case, and, like Merge Sort, is also a recursive algorithm. However, it is important to be aware that Quick Sort is O(n2) worst case, where, ironically, the input is already mostly sorted in either direction.

The way Quick Sort works is that it first chooses a pivot point (in our case, we'll just choose the middle element) with which to partition the working sequence into two partitions. Next, it puts all items with values smaller than the pivot value into the first partition and all items with values greater than the pivot value into the second partition. Recursively repeat the process on each partition.

Now that we understand how things are supposed to work, lets implement our QuickSort function (keeping our same call signature from previous sorting posts):

#define MID(lo, hi) (lo + ((hi - lo) >> 1))

#define SWAP(a, b) tmp = array[a], array[a] = array[b], array[b] = tmp

static void
QuickSortR (int *array, size_t lo, size_t hi)
{
    register size_t i, k;
    register int pivot;
    int tmp;
    
    /* make sure we have at least 2 elements */
    if ((hi - lo) < 1)
        return;
    
    /* cache our pivot value */
    pivot = array[MID (lo, hi)];
    
    i = lo;
    k = hi;
    
    do {
        /* find the first element with a value >= pivot value */
        while (i < k && array[i] < pivot)
            i++;
        
        /* find the last element with a value <= pivot value */
        while (k > i && array[k] > pivot)
            k--;
        
        if (i <= k) {
            SWAP (i, k);
            i++;
            k--;
        } else {
            break;
        }
    } while (1);
    
    if (lo < k) {
        /* recursively sort the first partition */
        QuickSortR (array, lo, k);
    }
    
    if (i < hi) {
        /* recursively sort the second partition */
        QuickSortR (array, i, hi);
    }
}

void
QuickSort (int *array, int n)
{
    QuickSortR (array, 0, n - 1);
}

Before we go any further, lets see how this compares to our Merge Sort implementation from earlier.

Since I no longer have the same machine that I timed Merge Sort with, I had to re-time it on my laptop (which is where I've been doing this since I can hack while laying back on my comfortable couch). My laptop is an IBM T40 ThinkPad which sports an Intel Pentium M Processor rated at 1600 MHz.

Since I left off sorting 10,000,000 items in the Merge Sort post, I figured I'd use the same array size here.

For Merge Sort, I'm consistently getting around 6.96s. For our Quick Sort implementation, I'm getting about 5.40s.

It would be nice if we could get rid of the function call overhead of recursion like we did with Binary Insertion Sort.

We're going to have to get crafty in order to work around the need to make our function recursive. What we can do is keep our own stack, but growing it dynamically could potentially kill any performance gains we could hope for... so, lets see what Donald Knuth has to say about the mathematical properties of this sort algorithm.

An auxiliary stack with at most [lg N] entries is needed for temporary storage.

As it happens, log2 of the max unsigned value of any integer type is the number of bits in said integer type. This means that on my system, where integers are 32bit, I'll need a stack size of 32.

What do we store on our stack? Well, what variables do we need for QuickSortR()? We need array, high, and low. If we're going to make QuickSort() non-recursive, then that means we'll always have access to array which means the only variables we need to hold on our stack are high and low.

So here it is, your Moment of Zen:

typedef struct qstack {
    size_t lo, hi;
} qstack_t;

void
QuickSort (int *array, size_t n)
{
    qstack_t stack[32], *sp;
    register size_t i, k;
    register int pivot;
    size_t lo, hi;
    int tmp;
    
    /* initialize our stack */
    sp = stack;
    sp->hi = n - 1;
    sp->lo = 0;
    sp++;
    
    do {
        /* pop our lo and hi indexes off the stack */
        sp--;
        lo = sp->lo;
        hi = sp->hi;
        
        if ((hi - lo) < 1)
            continue;
        
        /* cache our pivot value */
        pivot = array[MID (lo, hi)];
        
        i = lo;
        k = hi;
        
        do {
            /* find the first element with a value >= pivot value */
            while (i < k && array[i] < pivot)
                i++;
            
            /* find the last element with a value <= pivot value */
            while (k > i && array[k] > pivot)
                k--;
            
            if (i <= k) {
                SWAP (i, k);
                i++;
                k--;
            } else {
                break;
            }
        } while (1);
        
        if (lo < k) {
            /* push the first partition onto our stack */
            sp->lo = lo;
            sp->hi = k;
            sp++;
        }
        
        if (i < hi) {
            /* push the second partition onto our stack */
            sp->lo = i;
            sp->hi = hi;
            sp++;
        }
    } while (sp > stack);
}

Once again, we see a slight improvement in execution time: 5.10s. Didn't seem to help all that much, but was worth giving it a shot. Might be more beneficial on non-x86 platforms, though...

As I mentioned in the opening paragraph, one of the major problems with Quick Sort is that for already sorted (or nearly sorted) inputs, Quick Sort hits its pathological corner case and becomes extremely inefficient compared to other sorts. One of the more popular solutions to this problem (as used by most libc qsort() implementations) is to fall back to Insertion Sort once an input segment is small enough. I'll leave this up to the reader to consider ways of implementing this approach.

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

4 comments:

Anonymous said...

You should really have a look at the paper titled "Engineering a Sort Function", it describes an optimized qsort implementation that already has all the goodies you're mentioning.

For some reason, it is used by the BSD C Library, but not the GNU C one. Go figure...

Jeffrey Stedfast said...

Cool, thanks for the info!

Anonymous said...

You're missing something.. oh... yeah there it is: Why not just template the thing, and instantly you have a wicked fast sort for any type? Oh wait.. that's what STL does already in std::sort, which consistently beats qsort and is likely at least on par with your algorithm.

And there we go, another reinvention of the wheel.
:)

Anonymous said...

I would like to implement your algorithm in the 68000 language. Any suggestions would be great. Nice BLOGG This is for a class I'm taking in school..

Thanks for the Help
splitline@gmail.com

Code Snippet Licensing

All code posted to this blog is licensed under the MIT/X11 license unless otherwise stated in the post itself.