Sunday, February 25, 2007

Merge Sort

Note: This is a repost from an old weblog.

Like Binary Insertion Sort, Merge Sort is a recursive algorithm, but Merge Sort is O(n lg n) whereas Binary Insertion Sort had been O(n2). It works by splitting up the set into halves until each set is down to 1 item. It then merges them back together again in sorted order. It's a very simple concept although a bit hard to explain in words. Allow me to illustrate instead:

Given: [6 8 1 3 7 2 5 4]

The first thing we do is split this set in half:

[6 8 1 3] [7 2 5 4]

We keep splitting in half until we can't split anymore:

[6 8] [1 3] [7 2] [5 4]

[6] [8] [1] [3] [7] [2] [5] [4]

We now merge the items back into their previous sets in sorted order:

[6 8] [1 3] [2 7] [4 5]

And again:

[1 3 6 8] [2 4 5 7]

And again:

[1 2 3 4 5 6 7 8]

Not so bad, right? So how do we implement this? Well, breaking the sets into halves is fairly straight forward so I won't bother explaining that. The “hard” part is the merging, and even that's not so hard. Let me illustrate step by step how we merged 2 sets back into a single set:

set #1: [2 7]
set #2: [4 5]

We start by comparing the first item in each set. Is 2 less than 4? Yes, so we put 2 into our “merged” set:

set #1: [7]
set #2: [4 5]
merged: [2]

Now compare 7 and 4. Since 4 is less than 7, we append 4 into our merged set:

set #1: [7]
set #2: [5]
merged: [2 4]

Now compare 7 and 5. Since 5 is less than 7, we append 5 into our merged set:

set #1: [7]
set #2: []
merged: [2 4 5]

Since set #2 is out of items, we just append the remainder of set #1 into our merged set:

merged: [2 4 5 7]

Tada!

As you can probably see, it looks like we'll need a temporary array to hold our merged set as we merge the 2 subsets back together. If we want to keep to the same API we've been using for the previous sorting routines, we'll need a second function that will do most of the work. Since we know we'll never need our temporary array to be larger than the original input array, we might as well create it in our main function and pass it off to our recursive worker function:

void
MergeSort (int a[], int n)
{
    int *b;

    if (!n || !(b = malloc (sizeof (int) * n)))
        return;

    msort (a, b, 0, n - 1);

    free (b);
}

As you can see here, I called my worker function msort(), which takes as input the original array, the temporary array (b[]), the offset of the first item, and the offset of the last item. Now let's take a look at msort()'s implementation:

static void
msort (int a[], int b[], int lo, int hi)
{
    register int h, i, l;
    int mid;

    if (lo >= hi)
        return;

    mid = lo + ((hi - lo) / 2);
    msort (a, b, lo, mid);
    msort (a, b, mid + 1, hi);

    for (i = 0, l = lo, h = mid + 1; l <= mid && h <= hi; i++) {
        if (a[l] <= a[h])
            b[i] = a[l++];
        else
            b[i] = a[h++];
    }

    while (l <= mid)
        b[i++] = a[l++];
    while (h <= hi)
        b[i++] = a[h++];

    memcpy (a + lo, b, sizeof (int) * i);
}

You'll notice that the first thing we do is find the midpoint of our set starting with lo and ending with hi. We then call msort() on each of the 2 resulting subsets (lo to mid, and mid + 1 to hi). After msort() is finished with each of those subsets, the 2 subsets will be in sorted order and will be stored back in array a[] at positions lo through mid and mid + 1 through hi.

We then need to merge the 2 subsets back together again into a sorted set starting at position lo and ending with position hi. We'll use i to represent the next position to add an item into our temporary array, b[]. We'll use l and h to represent the cursor for each of the subsets (l for low and h for high). Notice that I check a[l] <= a[h] rather than a[l] > a[h], this is so that items with the same value in the array are in the same order they were in originally. For integer arrays, this is no big deal – but it is often a preferred feature when sorting arrays of custom data. Sorting algorithms that have this property are called “stable sorts”.

The for-loop merges 1 item from one of the 2 subsets into the temporary array until one of the subsets becomes empty. At this point, one of the 2 while-loops will be executed, appending the remainder of the non-empty set to the merged set.

Now that we've merged the 2 subsets back together into our temporary array, b[], we need to copy the result back into a[] at the subrange we started with.

And there you have it.

Since I'm sure you're as anxious to find out how fast this is compared to our previous sorts for sorting 100,000 items as I am, let's give it a whirl and find out. Wow, that was fast – it averaged 0.044s for sorting that 100,000 item array of random integers. That's the fastest time yet! Now that we have such a fast algorithm for sorting, we'll need to bump up our array size in order to compare further sorting algorithms. Let's go for 10,000,000. This time I get 6.344s which is still faster than most of our previous sort algorithms at 100,000 items!

The first optimization I see is that we could replace those while-loops with memcpy()'s. Turns out, that doesn't really change the results for our particular input (it might make a difference if the majority of values of one set or the other was larger than those in the other, but this wouldn't be the case for our input).

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

Update: I've written a follow-up article on Optimizing Merge Sort that may be of interest as well. This follow-up article deals with optimizing a general-purpose Merge Sort implementation to be on par with the libc qsort() function.

Shell Sort

Note: This is a repost from an old weblog.

The Shell Sort algorithm is designed to move items over large distances each iteration. The idea behind this is that it will get each item closer to its final destination quicker saving a lot of shuffling by comparing items farther apart.

The way it works is it subdivides the dataset into smaller groups where each item in the group is a set distance apart. For example, if we use h to represent our distance and R to represent an item, we might have groups: { R1, Rh+1, R2h+1, ... }, { R2, Rh+2, R2h+2, ... }, ...

We then sort each subgroup individually.

Keep repeating the above process, continually reducing h, until h becomes 1. After one last run-through where h is a value of 1, we stop.

At this point, I'm sure you are wondering where h comes from and what values to reduce it by each time though. If and when you ever figure it out, let me know - you might also want to publish a paper and submit it to ACM so that your name might go down in the history (and algorithm) books. That's right, as far as I'm aware, no one knows the answer to this question, except, perhaps, God Himself :-)

If you are interested, you might take a look at Donald Knuth's book, Art of Computer Programming, Volume 3: Sorting and Searching (starting on page 83), for some mathematical discussion on the matter.

As far as I understand from Sorting and Searching, it is theoretically possible to get the Shell Sort algorithm to approach O(n1.5) given an ideal increment table which is quite impressive.

Knuth gives us a couple of tables to start with: [8 4 2 1] and [7 5 3 1] which seem to work okay, but are far from being ideal for our 100,000 item array that we are trying to sort in this exercise, however, for the sake of keeping our first implementation simple, we'll use the [7 5 3 1] table since it has the charming property where each increment size is 2 smaller than the previous. Yay for simplicity.

void
ShellSort (int a[], int n)
{
    int h, i, j;
    int tmp;

    for (h = 7; h >= 1; h -= 2) {
        for (i = h; i < n; i++) {
            tmp = a[i];
            for (j = i; j >= h && a[j - h] > tmp; j -= h)
                a[j] = a[j - h];
            a[j] = tmp;
        }
    }
}

The nice thing about Shell Sort is that, while the increment table is a complete mystery, the algorithm itself is quite simple and well within our grasp.

Let's plug this into our sort program and see how well it does.

I seem to get a pretty consistent 6.3 seconds for 100,000 items on my AMD Athlon XP 2500 system which is almost as good as the results we were getting from our Binary Insertion Sort implementation.

Now for some optimizations. We know that an ideal set of increment sizes will get us down to close to O(n1.5) and that it is unlikely that the [7 5 3 1] set is ideal, so I suggest we start there.

On a hunch, I just started adding more and more primes to our [7 5 3 1] table and noticed that with each new prime added, it seemed to get a little faster. At some point I decided to experiment a bit and tried using a set of primes farther apart and noticed that with a much smaller set of increments, I was able to get about the same performance as my much larger set of primes. This spurred me on some more and I eventually came up with the following set:

{ 14057, 9371, 6247, 4177, 2777, 1861, 1237, 823, 557, 367, 251, 163, 109, 73, 37, 19, 11, 7, 5, 3, 1 }

In order to use this set, however, we need a slightly more complicated method for determining the next value for h than just h -= 2, so I used a lookup table instead:

static int htab[] = { 14057, 9371, 6247, 4177, 2777, 1861, 1237, 823, 557, 367, 251, 163, 109, 73, 37, 19, 11, 7, 5, 3, 1, 0 };

void
ShellSort (int a[], int n)
{
    int *h, i, j;
    int tmp;

    for (h = htab; *h; h++) {
        for (i = *h; i < n; i++) {
            tmp = a[i];
            for (j = i; j >= *h && a[j - *h] > tmp; j -= *h)
                a[j] = a[j - *h];
            a[j] = tmp;
        }
    }
}

With this new table of increments, I was able to achieve an average sort time of about 0.09 seconds. In fact, ShellSort() in combination with the above increment table will sort an array of 2 million items in about the same amount of time as our optimized BinaryInsertionSort() took to sort 100 thousand items. That's quite an improvement, wouldn't you say!?

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

Binary Insertion Sort

Note: this is a repost from an old weblog.

The Binary variant of Insertion Sort uses a binary search to find the appropriate location to insert the new item into the output. Before we can implement this variant of the insertion sort, we first need to know and understand how the binary search algorithm works:

Binary Search

Binary searching, for those who don't know (no need to raise your hand in shame), is a method of searching (duh) in which you attempt to compare as few items in the dataset to the item you are looking for as possible by halving your dataset with each comparison you make. Hence the word binary (meaning two). Allow me to illustrate:

Given the following dataset of 9 items in sorted order, find the value 8:

[1 2 3 4 5 6 7 8 9]

The first step is to check the midpoint (there are 9 elements, so the midpoint would be (9+1)/2 which is the 5th element):

[1 2 3 4 5 6 7 8 9]

Well, 5 is not 8 so we're not done yet. Is 8 less than 5? No. Is 8 greater than 5? Yes. So we know that the value we seek is in the second half of the array (if it's there). We are now left with:

[6 7 8 9]

Oops. 4 does not have a midpoint, so we round off any way we want to (it really doesn't matter). (4+1)/2 = 2.5, so for the sake of argument lets just drop the decimal and check the 2nd element (this is how integer math works anyway, so it's simpler):

[6 7 8 9]

Well, 7 is not 8 so we're not done yet. Is 8 less than 7? No. Is 8 greater than 7? Yes. So we know that the value we seek is in the second half of the array (if it's there). We are now left with:

[8 9]

Again we find the midpoint, which in this case is 1.

[8 9]

Does 8 equal 8? Yes! We're done!

How many tries did that take us? Three. Three tries and we found what we were looking for. And, in the worst possible case for this particular array (searching for the 9), we would have been able to do it in 4 tries.

Not bad, huh?

To implement our BinarySearch() function, the easiest way will be to write it using recursion. Recursion is a technique that allows us to break the problem into smaller and smaller pieces (or, the array in this case - as illustrated above) until we arrive at the solution (or, in this case, the item we are looking for). Recursion is implemented by having a function call itself to process a subset of the data, which, may again call itself (repeatedly until the problem is solved).

Notice in my above explanation of how to search the 9-item array using the binary search algorithm, how I continually break the array into smaller and smaller subsets? That's why binary search is often described as a recursive algorithm - you keep repeating the the process on smaller and smaller subsets until your subset cannot get any smaller or until you find what you are looking for.

Hopefully I've explained that well enough, if not - perhaps taking a look at the following implementation will help?

int
BinarySearch (int a[], int low, int high, int key)
{
    int mid;

    mid = low + ((high - low) / 2);

    if (key > a[mid])
        return BinarySearch (a, mid + 1, high, key);
    else if (key < a[mid])
        return BinarySearch (a, low, mid, key);

    return mid;
}

Note: To get the midpoint, we use the formula low + ((high - low) / 2) instead of (high + low) / 2 because we want to avoid the possibility of an integer overflow.

In my above implementation, a[] is our integer array, low is the low-point of the subset in which we are looking, high is the high-point of the subset in which we are looking, and key is the item that we are looking for. It returns the integer index of the value we are looking for in the array. Here's how we would call this function from our program: BinarySearch (a, 0, 9, 8);

Remember that in C, array indexes start at 0. We pass 9 as the high-point because there are 9 elements in our array. I don't think I need to explain why we pass 8 as the key.

If we then plug this into our Insertion Sort algorithm from yesterday (instead of doing a linear search), we get a Binary Insertion Sort... almost. There's one minor change we have to make to our BinarySearch() function first - since we are not necessarily trying to find an already-existing item in our output, we need to adjust BinarySearch() to return the ideal location for our key even in the event that it doesn't yet exist. Don't worry, this is a very simple adjustment:

int
BinarySearch (int a[], int low, int high, int key)
{
    int mid;

    if (low == high)
        return low;

    mid = low + ((high - low) / 2);

    if (key > a[mid])
        return BinarySearch (a, mid + 1, high, key);
    else if (key < a[mid])
        return BinarySearch (a, low, mid, key);

    return mid;
}

Okay, now we're ready to plug it in. Here's the result:

void
BinaryInsertionSort (int a[], int n)
{
    int ins, i, j;
    int tmp;

    for (i = 1; i < n; i++) {
        ins = BinarySearch (a, 0, i, a[i]);
        tmp = a[i];
        for (j = i - 1; j >= ins; j--)
            a[j + 1] = a[j];
        a[ins] = tmp;
    }
}

There's a couple of optimizations we can make here. If ins is equal to i, then we don't need to shift any items nor do we need to set a[ins] to the value in a[i], so we can wrap those last 4 lines in an if-statement block:

void
BinaryInsertionSort (int a[], int n)
{
    int ins, i, j;
    int tmp;

    for (i = 1; i < n; i++) {
        ins = BinarySearch (a, 0, i, a[i]);
        if (ins < i) {
            tmp = a[i];
            for (j = i - 1; j >= ins; j--)
                a[j + 1] = a[j];
            a[ins] = tmp;
        }
    }
}

If you remember from our Insertion Sort optimizations, the memmove() function really helped speed up shifting a lot of items all at once. But before we do that, lets see what kind of times our current implementation takes to sort 100,000 items (just so we have something to compare against after we make that memmove() optimization).

On my AMD Athlon XP 2500, it takes roughly 25 seconds to sort 100,000 random items (using the main() function from our Bubble Sort analysis a few days ago). That's already on par with yesterday's optimized insertion sort implementation and we haven't even plugged in memmove() yet!

I'm pretty excited and can't wait to plug in memmove() to see what kind of results we get, so here we go:

void
BinaryInsertionSort (int a[], int n)
{
    int ins, i, j;
    int tmp;

    for (i = 1; i < n; i++) {
        ins = BinarySearch (a, 0, i, a[i]);
        if (ins < i) {
            tmp = a[i];
            memmove (a + ins + 1, a + ins, sizeof (int) * (i - ins));
            a[ins] = tmp;
        }
    }
}

After plugging in memmove(), our BinaryInsertionSort() sorts 100,000 items in about 5.5 seconds! Whoo!

Lets recap: We started off with the goal of sorting 100,000 items using Bubble Sort which took over over 103 seconds to accomplish. We then scratched our heads and optimized the Bubble Sort algorithm the best we could and obtained a result that was twice as fast. Then we traded our BubbleSort() implementation in for another O(n2) algorithm (generally not something I would suggest wasting time on, but for the sake of learning about a bunch of sorting algorithms, why not?). Using InsertionSort(), we got that down to 26.5 seconds which is 4 times faster than our original sort! But wait, then we took our InsertionSort() one step farther and used a binary search algorithm to help us find the ideal location in which to insert our item and we got it down to a whopping 5.5 seconds - 20 times faster than what we started with!

That's pretty awesome if you ask me...

There's still one more optimization we can try on our BinaryInsertionSort() implementation to attempt to get faster speeds, but it may be over your head if you're a beginner so feel free to ignore my following code dump:

void
BinaryInsertionSort (int a[], int n)
{
    register int i, m;
    int hi, lo, tmp;

    for (i = 1; i < n; i++) {
        lo = 0, hi = i;
        m = i / 2;

        do {
            if (a[i] > a[m]) {
                lo = m + 1;
            } else if (a[i] < a[m]) {
                hi = m;
            } else
                break;

            m = lo + ((hi - lo) / 2);
        } while (lo < hi);
 
        if (m < i) {
            tmp = a[i];
            memmove (a + m + 1, a + m, sizeof (int) * (i - m));
            a[m] = tmp;
        }
    }
}

The basic idea here was that I wanted to get rid of recursion because, unfortunately, as nice as recursion is for visualizing (and implementing) our binary search, there is a cost penalty for every time we recurse - there is both function call overhead (which slows us down) and a memory overhead (we need to grow the stack). If we're clever enough, we can sometimes bypass the need for recursion like I did above which can often lead to significant performance gains.

In this case, however, we didn't improve the performance all that much (down from 5.5 to 5.2 seconds) but on other architectures (such as SPARC), it may be a lot more noticeable. It may also be a lot more noticeable for larger datasets. In fact, lets try sorting 200,000 items and see what sort of times we get.

Ah, much more noticeable now. With recursion, it takes 46 seconds and without, it takes 37 seconds - so there is definitely an advantage to cleverly working around the need for recursion.

Okay, that's all I've got for Binary Insertion Sort, but there are still plenty more sorting algorithms to try before we call it a day - we've just barely scratched the surface - and so far we've only looked into the O(n2) algorithms - there are others that promise O(n lg n) performance!

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

Insertion Sort

Note: this is a repost from an old weblog.

Insertion Sort is another O(n2) sorting algorithm (just like Bubble Sort was) but is a little bit faster. Now, ideally, when trying to optimize by picking a new algorithm - you don't want to go from one slow algorithm to another, but I figure while I'm explaining sorting techniques - I might as well explain a bunch of them. Anyways... it turns out that insertion sort works out to be faster than Bubble Sort in the general case.

The way Insertion Sort works, is you move one item from your input dataset into an output dataset one item at a time. Take an item from the input and place it in the output at the proper position such that the output dataset is always in sorted order. Allow me to demonstrate:

Given: [5 8 2 9 6 3 7 1 0 4]

That's our input dataset. Our output dataset is so far empty. The first thing we do, is to take the first item of our input dataset (or any item, really, but we might as well grab them out of the input dataset in order, right?) and place it in the output dataset. Since the output dataset will only have 1 item this first round, there's nothing to sort. The result is:

Input: [8 2 9 6 3 7 1 0 4] Output: [5]

Now take the next input item and place it in the output, but in the proper sorted order:

Input: [2 9 6 3 7 1 0 4] Output: [5 8]

Lather. Rinse. Repeat.

Input: [9 6 3 7 1 0 4] Output: [2 5 8]
Input: [6 3 7 1 0 4] Output: [2 5 8 9]
Input: [3 7 1 0 4] Output: [2 5 6 8 9]
Input: [7 1 0 4] Output: [2 3 5 6 8 9]
Input: [1 0 4] Output: [2 3 5 6 7 8 9]
Input: [0 4] Output: [1 2 3 5 6 7 8 9]
Input: [4] Output: [0 1 2 3 5 6 7 8 9]
Input: [] Output: [0 1 2 3 4 5 6 7 8 9]

Tada! We have a sorted dataset. This is the basis of all insertion sort variants, the difference between the various insertion sorts (I only know of 2, but wouldn't be surprised if there were more) is the way in which we find the ideal place to insert the new item from the input into the output.

Before I get into the different variants, however, let me address one of the concerns you likely have thus far: if we're sorting a large-ish array (note: if we are sorting a linked list, then this isn't an issue), we don't want to have two arrays because that doubles the amount of memory we're using!

Indeed, but if you'll notice: as the output size increases, the input size decreases - together, both the input and output sizes total the same number of elements and if we iterate through the input elements in order, then we can share the same array as both the input and output arrays. Let me re-illustrate:

Output | Input
[][5 8 2 9 6 3 7 1 0 4]
[5 ][8 2 9 6 3 7 1 0 4]
[5 8 ][2 9 6 3 7 1 0 4]
[2 5 8 ][9 6 3 7 1 0 4]
...

See how we can cheat now? Visualization is the key to solving so many problems. Don't be afraid to take a pencil and paper and "draw" the problem - many times the solution will present itself.

Linear Insertion Sort

Generally when someone refers to Insertion Sort, this is the variant that they are talking about. As I was saying above, the difference between the variants is how they find the proper position in which to insert the new item into the output. As the name suggests, Linear insertion sort uses a linear search in order to find this position. Basically, this comes down to scanning through the output array starting at the first position and comparing the values: if the new item is larger than the first output value, move to the next output item and so on until the new item's value is less than the item we are comparing against in the output - once we find that, we have found the position to insert the new item.

Note: we could actually get away with using <= to minimize the number of comparisons we have to do, but then if we have multiple items with the same values, they won't remain in the same order they were in in the input, but if that doesn't bother you, then definitely use <=. Sorts that guarantee that items in the output with the same comparison values are in the same order as they were in the input are called "stable sorts". This is usually a pretty desirable trait (unless performance is more important). The lucky thing for us, though, is that in this particular case, keeping duplicate items in the same order we found them in the input, would force us to work backwards starting at the end of the output array rather than working forwards starting at the beginning of the output array which just so happens to allows us a more promising optimization. Pretty crafty, eh? I thought so...

One thing I should remind you about is that when you insert an item into the output, you'll need to shift all the items after that position one place to the right.

Okay, on to the implementation...

The first optimization you should notice is that after the first "loop", the first item of the input is always the first item of the output, so we might as well start with the second item of the input array. So with that, go ahead and implement this algorithm - you should get something similar to this:

void
InsertionSort (int a[], int n)
{
    int i, j = 0;
    int key;

    for (i = 1; i < n; j = i, i++) {
        key = a[i];
        while (j >= 0 && a[j] > key) {
            a[j + 1] = a[j];
            j--;
        }
        a[j + 1] = key;
    }
}

You'll notice that I took the "work backwards starting at the end of the output array" approach that I noted above. As I started to mention, this approach has the added benefit of allowing us to shift the output items as we go which means we get to shift fewer items per loop on average than if we had worked in the forward direction, resulting in a performance gain (shifting a bunch of items is more expensive than an integer comparison).

If we feed this into our sort program and have main() call this new function rather than the BubbleSort() implementation we wrote yesterday, we find that we have a sort that is significantly faster - down to 36 seconds from the optimized BubbleSort()'s 51 seconds. Not bad...

See any obvious improvements we can make that might result in a significantly faster implementation of our InsertionSort() implementation? I don't see anything I would consider major, but we might try using memmove() instead of manually moving items one space to the right. We'll get the most bang for the buck if we wait until we find our optimal insertion point before moving anything around, so:

void
InsertionSort (int a[], int n)
{
    int i, j = 0;
    int key;

    for (i = 1; i < n; j = i, i++) {
        key = a[i];
        while (j >= 0 && a[j] > key)
            j--;
        memmove (a + (j + 2), a + (j + 1), sizeof (int) * ((i - 1) - j));
        a[j + 1] = key;
    }
}

Okay, so with that very simple adjustment, we got our total sort time from 36 seconds down to 26.5 seconds. Nothing to scoff at, surely, but at the same time not as drastic an improvement as changing algorithms had been (see why algorithms are so important for optimizations?).

Now just imagine if we used an algorithm that promised better than O(n2) performance!

Before we jump to something better than O(n2) though, I still have one more trick up my sleeve to improve our insertion sort algorithm (not implementation this time, but rather: algorithm). I dub thee: Binary Insertion Sort.

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

Bubble Sort

Note: This is a repost from an old weblog.

Bubble Sort works by iterating through the dataset, comparing two neighboring items at a time, and swapping them if the first item is larger than the second item. In order to sort a dataset of N items, this operation must be done N-1 times. Let me illustrate:

Given:

Figure 1

We start by comparing the first two items, 5 and 8:

Figure 2

Since 5 is smaller than 8, we don't need to swap them.

Next, we compare the second set of items, 8 and 2: Figure 3

8 is larger than 2, so we have to swap them, resulting in:

Figure 4

We keep doing this until we reach the end of the dataset and then we'll start all over again - repeating the process N-1 times (in this example, 9 times since we have 10 items).

Figure 5

At the end of our first loop, we end up with:

Figure 6

Now we start the whole process over again (we have 8 more loops to go!).

Figure 7

Notice anything interesting at the very end? You guessed it! That last comparison is pretty worthless - the last item in the dataset is always going to be the largest value after the first loop, right? Right, but that's not all - each loop through the items, we can ignore more and more items from the end of the dataset (after the second loop, we can ignore the last 2 items; after the third loop, we can ignore the last 3 items and so on). That little trick is our first optimization for this algorithm!

Let's see what the code would look like:

void
BubbleSort (int a[], int n)
{
    int i, j, tmp;
    
    for (i = n - 1; i >= 0; i--) {
        for (j = 0; j < i; j++) {
            if (a[j] > a[j + 1]) {
                tmp = a[j];
                a[j] = a[j + 1];
                a[j + 1] = tmp;
            }
        }
    }
}

Now that we have some code to test, lets see how long it takes to sort a random array of 100,000 integers. To do that, we'll need to write a little program to use our BubbleSort() routine:

#include <stdlib.h>
#include <time.h>

int main (int argc, char **argv)
{
    int array[100000], i;
    
    srand (time (NULL));
    
    for (i = 0; i < 100000; i++)
        array[i] = rand ();
    
    BubbleSort (array, 100000);
    
    return 0;
}

We'll just use our system's time command to get an estimate of how long it takes to sort.

Go ahead and run our little program a few times. For me, it seems to average about 103 seconds on my AMD Athlon XP 2500.

I don't know about you, but I saw a pretty obvious optimization that we could make to our BubbleSort() implementation that might make it a bit faster. Remember how we noticed that each time through the inner loop, the net result was that the largest item was moved all the way to the right (discounting the largest items from previous loops)? What if, instead of swapping each time through the inner loop, we waited until the inner loop finished and then swapped? Let's try it:

void
BubbleSort (int a[], int n)
{
    int i, j, max, tmp;
    
    for (i = n - 1; i >= 0; i--) {
        for (max = 0, j = 1; j < i + 1; j++) {
            if (a[j] > a[max])
                max = j;
        }
        
        if (max < i) {
            tmp = a[max];
            a[max] = a[i];
            a[i] = tmp;
        }
    }
}

In the above code, we use max to hold the index of the largest item we find. We initialize it to the index of the first item (0 in languages like C) and then start our inner for-loop. You'll notice a change here: instead of starting j at 0, we start it at 1 because we've already "looked" at the first item. Also, we need to loop the same number of times - so we continue to iterate as long as j is less than i + 1 (rather than j < i of the previous implementation).

If I now compile our benchmarking program to use this new BubbleSort() implementation, we notice a huge performance increase - or at least I did on my machine. The average execution time for this new implementation seems to be about 51 seconds. That sure beats the pants off 103 seconds, doesn't it?

Even so, 51 seconds to sort 100,000 items is a long wait, especially since they are just simple integers. There's only one more optimization that I can think of. If you think about how bubble sort works, each time through the outer loop, you notice that if we ever go through one of those N-1 iterations without having to swap any items that it is safe to conclude that we're done and so can skip performing the remainder of the N-1 iterations. I'll leave this optimization as an exercise for my readers rather than posting a new implementation here.

Since I don't see any other real obvious optimizations that we can make which would drastically improve performance, I think it's time we consider another sorting algorithm.

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

Saturday, February 24, 2007

Good Luck, Bad Luck

Things are not always as bad as they first appear.

There once was a farmer living in the country with his wife and son. One day, one of their horses escaped from the stable.

Bad luck, their neighbors said.

The father took his son to go look for the horse. In so doing, they came across a dozen wild horses which they rounded up and brought back with them.

Good luck, they thought.

A few days later, while trying to tame one of these wild horses, the son broke his arm.

Bad luck?

The very next day, the army marched through town gathering all the abled young men into their ranks. The son was wounded and so wasn't drafted.

Good luck.

A few months ago, I lost my job. Bad luck, you ask? Not at all. When I was laid off, I had been feeling burnt out and really needed a good break from programming. The two and a half months of unemployment gave me the necessary time to regain my passion. As it turns out, I was hired by one of my best friends and mentors, Miguel de Icaza, to join him on his Mono team (which just so happens to be the "dream job" I had wanted before getting laid off).

Code Snippet Licensing

All code posted to this blog is licensed under the MIT/X11 license unless otherwise stated in the post itself.