## Sunday, February 25, 2007

### Binary Insertion Sort

Note: this is a repost from an old weblog.

The Binary variant of Insertion Sort uses a binary search to find the appropriate location to insert the new item into the output. Before we can implement this variant of the insertion sort, we first need to know and understand how the binary search algorithm works:

Binary Search

Binary searching, for those who don't know (no need to raise your hand in shame), is a method of searching (duh) in which you attempt to compare as few items in the dataset to the item you are looking for as possible by halving your dataset with each comparison you make. Hence the word binary (meaning two). Allow me to illustrate:

Given the following dataset of 9 items in sorted order, find the value 8:

[1 2 3 4 5 6 7 8 9]

The first step is to check the midpoint (there are 9 elements, so the midpoint would be (9+1)/2 which is the 5th element):

[1 2 3 4 5 6 7 8 9]

Well, 5 is not 8 so we're not done yet. Is 8 less than 5? No. Is 8 greater than 5? Yes. So we know that the value we seek is in the second half of the array (if it's there). We are now left with:

[6 7 8 9]

Oops. 4 does not have a midpoint, so we round off any way we want to (it really doesn't matter). (4+1)/2 = 2.5, so for the sake of argument lets just drop the decimal and check the 2nd element (this is how integer math works anyway, so it's simpler):

[6 7 8 9]

Well, 7 is not 8 so we're not done yet. Is 8 less than 7? No. Is 8 greater than 7? Yes. So we know that the value we seek is in the second half of the array (if it's there). We are now left with:

[8 9]

Again we find the midpoint, which in this case is 1.

[8 9]

Does 8 equal 8? Yes! We're done!

How many tries did that take us? Three. Three tries and we found what we were looking for. And, in the worst possible case for this particular array (searching for the 9), we would have been able to do it in 4 tries.

To implement our BinarySearch() function, the easiest way will be to write it using recursion. Recursion is a technique that allows us to break the problem into smaller and smaller pieces (or, the array in this case - as illustrated above) until we arrive at the solution (or, in this case, the item we are looking for). Recursion is implemented by having a function call itself to process a subset of the data, which, may again call itself (repeatedly until the problem is solved).

Notice in my above explanation of how to search the 9-item array using the binary search algorithm, how I continually break the array into smaller and smaller subsets? That's why binary search is often described as a recursive algorithm - you keep repeating the the process on smaller and smaller subsets until your subset cannot get any smaller or until you find what you are looking for.

Hopefully I've explained that well enough, if not - perhaps taking a look at the following implementation will help?

```int
BinarySearch (int a[], int low, int high, int key)
{
int mid;

mid = low + ((high - low) / 2);

if (key > a[mid])
return BinarySearch (a, mid + 1, high, key);
else if (key < a[mid])
return BinarySearch (a, low, mid, key);

return mid;
}
```

Note: To get the midpoint, we use the formula low + ((high - low) / 2) instead of (high + low) / 2 because we want to avoid the possibility of an integer overflow.

In my above implementation, a[] is our integer array, low is the low-point of the subset in which we are looking, high is the high-point of the subset in which we are looking, and key is the item that we are looking for. It returns the integer index of the value we are looking for in the array. Here's how we would call this function from our program: BinarySearch (a, 0, 9, 8);

Remember that in C, array indexes start at 0. We pass 9 as the high-point because there are 9 elements in our array. I don't think I need to explain why we pass 8 as the key.

If we then plug this into our Insertion Sort algorithm from yesterday (instead of doing a linear search), we get a Binary Insertion Sort... almost. There's one minor change we have to make to our BinarySearch() function first - since we are not necessarily trying to find an already-existing item in our output, we need to adjust BinarySearch() to return the ideal location for our key even in the event that it doesn't yet exist. Don't worry, this is a very simple adjustment:

```int
BinarySearch (int a[], int low, int high, int key)
{
int mid;

if (low == high)
return low;

mid = low + ((high - low) / 2);

if (key > a[mid])
return BinarySearch (a, mid + 1, high, key);
else if (key < a[mid])
return BinarySearch (a, low, mid, key);

return mid;
}
```

Okay, now we're ready to plug it in. Here's the result:

```void
BinaryInsertionSort (int a[], int n)
{
int ins, i, j;
int tmp;

for (i = 1; i < n; i++) {
ins = BinarySearch (a, 0, i, a[i]);
tmp = a[i];
for (j = i - 1; j >= ins; j--)
a[j + 1] = a[j];
a[ins] = tmp;
}
}
```

There's a couple of optimizations we can make here. If ins is equal to i, then we don't need to shift any items nor do we need to set a[ins] to the value in a[i], so we can wrap those last 4 lines in an if-statement block:

```void
BinaryInsertionSort (int a[], int n)
{
int ins, i, j;
int tmp;

for (i = 1; i < n; i++) {
ins = BinarySearch (a, 0, i, a[i]);
if (ins < i) {
tmp = a[i];
for (j = i - 1; j >= ins; j--)
a[j + 1] = a[j];
a[ins] = tmp;
}
}
}
```

If you remember from our Insertion Sort optimizations, the memmove() function really helped speed up shifting a lot of items all at once. But before we do that, lets see what kind of times our current implementation takes to sort 100,000 items (just so we have something to compare against after we make that memmove() optimization).

On my AMD Athlon XP 2500, it takes roughly 25 seconds to sort 100,000 random items (using the main() function from our Bubble Sort analysis a few days ago). That's already on par with yesterday's optimized insertion sort implementation and we haven't even plugged in memmove() yet!

I'm pretty excited and can't wait to plug in memmove() to see what kind of results we get, so here we go:

```void
BinaryInsertionSort (int a[], int n)
{
int ins, i, j;
int tmp;

for (i = 1; i < n; i++) {
ins = BinarySearch (a, 0, i, a[i]);
if (ins < i) {
tmp = a[i];
memmove (a + ins + 1, a + ins, sizeof (int) * (i - ins));
a[ins] = tmp;
}
}
}
```

After plugging in memmove(), our BinaryInsertionSort() sorts 100,000 items in about 5.5 seconds! Whoo!

Lets recap: We started off with the goal of sorting 100,000 items using Bubble Sort which took over over 103 seconds to accomplish. We then scratched our heads and optimized the Bubble Sort algorithm the best we could and obtained a result that was twice as fast. Then we traded our BubbleSort() implementation in for another O(n2) algorithm (generally not something I would suggest wasting time on, but for the sake of learning about a bunch of sorting algorithms, why not?). Using InsertionSort(), we got that down to 26.5 seconds which is 4 times faster than our original sort! But wait, then we took our InsertionSort() one step farther and used a binary search algorithm to help us find the ideal location in which to insert our item and we got it down to a whopping 5.5 seconds - 20 times faster than what we started with!

That's pretty awesome if you ask me...

There's still one more optimization we can try on our BinaryInsertionSort() implementation to attempt to get faster speeds, but it may be over your head if you're a beginner so feel free to ignore my following code dump:

```void
BinaryInsertionSort (int a[], int n)
{
register int i, m;
int hi, lo, tmp;

for (i = 1; i < n; i++) {
lo = 0, hi = i;
m = i / 2;

do {
if (a[i] > a[m]) {
lo = m + 1;
} else if (a[i] < a[m]) {
hi = m;
} else
break;

m = lo + ((hi - lo) / 2);
} while (lo < hi);

if (m < i) {
tmp = a[i];
memmove (a + m + 1, a + m, sizeof (int) * (i - m));
a[m] = tmp;
}
}
}
```

The basic idea here was that I wanted to get rid of recursion because, unfortunately, as nice as recursion is for visualizing (and implementing) our binary search, there is a cost penalty for every time we recurse - there is both function call overhead (which slows us down) and a memory overhead (we need to grow the stack). If we're clever enough, we can sometimes bypass the need for recursion like I did above which can often lead to significant performance gains.

In this case, however, we didn't improve the performance all that much (down from 5.5 to 5.2 seconds) but on other architectures (such as SPARC), it may be a lot more noticeable. It may also be a lot more noticeable for larger datasets. In fact, lets try sorting 200,000 items and see what sort of times we get.

Ah, much more noticeable now. With recursion, it takes 46 seconds and without, it takes 37 seconds - so there is definitely an advantage to cleverly working around the need for recursion.

Okay, that's all I've got for Binary Insertion Sort, but there are still plenty more sorting algorithms to try before we call it a day - we've just barely scratched the surface - and so far we've only looked into the O(n2) algorithms - there are others that promise O(n lg n) performance! Anonymous said...

If the language rounds n.5 as n instead of (n + 1), then calling the function like below will cause stack overflow because it will be stuck when low = 4 and high = 5.

BinarySearch (a, 0, 9, 6);

Otherwise, its very concise and told me the importance of grasping the basics. Thanks. Anonymous said...

My apologies, that is due to the behavior of CInt() in VB, where CInt(4.5) = 4 and CInt(5.5) = 6 because the rounding depends on the integer part been odd (floor) or even (ceiling).

Don't use VB. Anonymous said...

I'm trying to use your last piece of code to write my own binary insertion sort; however I'm hitting some snags--new to coding. I'm using Java within Eclipse.

1)What do you use "register" for
and

2)what is sizeOf(int)?--the compiler just yells at me if I leave it that.

Jeffrey Stedfast said...

Hi Etalian,

1) The register keyword is just a hint to the compiler to map the variable to a CPU register. It's just an optimization that is not necessary at all.

2) sizeof (int) is used to measure how many bytes are in an int on your platform. On really old computers it might be 2. Most computers these days are 4 (if you are on a 32bit OS) or 8 (8 if you are on a 64bit OS).

Since you can't really use memmove() in Java, you'll have to use the for-loop from one of the previous implementations. Simply replace the memmove() line with:

for (j = i - 1; j >= ins; j--)
a[j + 1] = a[j];

If binary search works only on sorted arrays , then how can it be implemented on sorting arrays ?

Eric said...

Terribly sorry to dredge up such an old post, but I ported this to javascript today (replacing memmove with the for loop, as per the comments), and it reduced a 10 second sort (using javascript's .sort() method) to easily under a second. Amazing! Props and thanks! I hope you don't mind. I provided a link to this post in the code.

Eric Bowden said...

One thing to keep in mind when porting to javascript is that you have to use Math.floor to truncate the floats when division occurs

function binaryInsertionSort (a)
{
var i, m;
var hi, lo, tmp;
var n = a.length;

for (i = 1; i < n; i++) {
lo = 0, hi = i;
m = Math.floor(i / 2);

do {
if (a[i] > a[m]) {
lo = m + 1;
} else if (a[i] < a[m]) {
hi = m;
} else
break;

m = Math.floor(lo + ((hi - lo) / 2));
} while (lo < hi);

if (m < i) {
tmp = a[i];
for (j = i - 1; j >= m; j--)
a[j + 1] = a[j];
a[m] = tmp;
}
}

return a;
}