Note: This is a repost from an old weblog.
Like Binary Insertion Sort, Merge Sort is a recursive algorithm, but Merge Sort is O(n lg n) whereas Binary Insertion Sort had been O(n2). It works by splitting up the set into halves until each set is down to 1 item. It then merges them back together again in sorted order. It's a very simple concept although a bit hard to explain in words. Allow me to illustrate instead:
Given: [6 8 1 3 7 2 5 4]
The first thing we do is split this set in half:
[6 8 1 3] [7 2 5 4]
We keep splitting in half until we can't split anymore:
[6 8] [1 3] [7 2] [5 4]
[6] [8] [1] [3] [7] [2] [5] [4]
We now merge the items back into their previous sets in sorted order:
[6 8] [1 3] [2 7] [4 5]
And again:
[1 3 6 8] [2 4 5 7]
And again:
[1 2 3 4 5 6 7 8]
Not so bad, right? So how do we implement this? Well, breaking the sets into halves is fairly straight forward so I won't bother explaining that. The “hard” part is the merging, and even that's not so hard. Let me illustrate step by step how we merged 2 sets back into a single set:
set #1: [2 7]
set #2: [4 5]
We start by comparing the first item in each set. Is 2 less than 4? Yes, so we put 2 into our “merged” set:
set #1: [7]
set #2: [4 5]
merged: [2]
Now compare 7 and 4. Since 4 is less than 7, we append 4 into our merged set:
set #1: [7]
set #2: [5]
merged: [2 4]
Now compare 7 and 5. Since 5 is less than 7, we append 5 into our merged set:
set #1: [7]
set #2: []
merged: [2 4 5]
Since set #2 is out of items, we just append the remainder of set #1 into our merged set:
merged: [2 4 5 7]
Tada!
As you can probably see, it looks like we'll need a temporary array to hold our merged set as we merge the 2 subsets back together. If we want to keep to the same API we've been using for the previous sorting routines, we'll need a second function that will do most of the work. Since we know we'll never need our temporary array to be larger than the original input array, we might as well create it in our main function and pass it off to our recursive worker function:
void MergeSort (int a[], int n) { int *b; if (!n || !(b = malloc (sizeof (int) * n))) return; msort (a, b, 0, n - 1); free (b); }
As you can see here, I called my worker function msort(), which takes as input the original array, the temporary array (b[]), the offset of the first item, and the offset of the last item. Now let's take a look at msort()'s implementation:
static void msort (int a[], int b[], int lo, int hi) { register int h, i, l; int mid; if (lo >= hi) return; mid = lo + ((hi - lo) / 2); msort (a, b, lo, mid); msort (a, b, mid + 1, hi); for (i = 0, l = lo, h = mid + 1; l <= mid && h <= hi; i++) { if (a[l] <= a[h]) b[i] = a[l++]; else b[i] = a[h++]; } while (l <= mid) b[i++] = a[l++]; while (h <= hi) b[i++] = a[h++]; memcpy (a + lo, b, sizeof (int) * i); }
You'll notice that the first thing we do is find the midpoint of our set starting with lo and ending with hi. We then call msort() on each of the 2 resulting subsets (lo to mid, and mid + 1 to hi). After msort() is finished with each of those subsets, the 2 subsets will be in sorted order and will be stored back in array a[] at positions lo through mid and mid + 1 through hi.
We then need to merge the 2 subsets back together again into a sorted set starting at position lo and ending with position hi. We'll use i to represent the next position to add an item into our temporary array, b[]. We'll use l and h to represent the cursor for each of the subsets (l for low and h for high). Notice that I check a[l] <= a[h] rather than a[l] > a[h], this is so that items with the same value in the array are in the same order they were in originally. For integer arrays, this is no big deal – but it is often a preferred feature when sorting arrays of custom data. Sorting algorithms that have this property are called “stable sorts”.
The for-loop merges 1 item from one of the 2 subsets into the temporary array until one of the subsets becomes empty. At this point, one of the 2 while-loops will be executed, appending the remainder of the non-empty set to the merged set.
Now that we've merged the 2 subsets back together into our temporary array, b[], we need to copy the result back into a[] at the subrange we started with.
And there you have it.
Since I'm sure you're as anxious to find out how fast this is compared to our previous sorts for sorting 100,000 items as I am, let's give it a whirl and find out. Wow, that was fast – it averaged 0.044s for sorting that 100,000 item array of random integers. That's the fastest time yet! Now that we have such a fast algorithm for sorting, we'll need to bump up our array size in order to compare further sorting algorithms. Let's go for 10,000,000. This time I get 6.344s which is still faster than most of our previous sort algorithms at 100,000 items!
The first optimization I see is that we could replace those while-loops with memcpy()'s. Turns out, that doesn't really change the results for our particular input (it might make a difference if the majority of values of one set or the other was larger than those in the other, but this wouldn't be the case for our input).
To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)
Update: I've written a follow-up article on Optimizing Merge Sort that may be of interest as well. This follow-up article deals with optimizing a general-purpose Merge Sort implementation to be on par with the libc qsort() function.