## Sunday, February 25, 2007

### Shell Sort

Note: This is a repost from an old weblog.

The Shell Sort algorithm is designed to move items over large distances each iteration. The idea behind this is that it will get each item closer to its final destination quicker saving a lot of shuffling by comparing items farther apart.

The way it works is it subdivides the dataset into smaller groups where each item in the group is a set distance apart. For example, if we use h to represent our distance and R to represent an item, we might have groups: { R1, Rh+1, R2h+1, ... }, { R2, Rh+2, R2h+2, ... }, ...

We then sort each subgroup individually.

Keep repeating the above process, continually reducing h, until h becomes 1. After one last run-through where h is a value of 1, we stop.

At this point, I'm sure you are wondering where h comes from and what values to reduce it by each time though. If and when you ever figure it out, let me know - you might also want to publish a paper and submit it to ACM so that your name might go down in the history (and algorithm) books. That's right, as far as I'm aware, no one knows the answer to this question, except, perhaps, God Himself :-)

If you are interested, you might take a look at Donald Knuth's book, Art of Computer Programming, Volume 3: Sorting and Searching (starting on page 83), for some mathematical discussion on the matter.

As far as I understand from Sorting and Searching, it is theoretically possible to get the Shell Sort algorithm to approach O(n1.5) given an ideal increment table which is quite impressive.

Knuth gives us a couple of tables to start with: [8 4 2 1] and [7 5 3 1] which seem to work okay, but are far from being ideal for our 100,000 item array that we are trying to sort in this exercise, however, for the sake of keeping our first implementation simple, we'll use the [7 5 3 1] table since it has the charming property where each increment size is 2 smaller than the previous. Yay for simplicity.

```void
ShellSort (int a[], int n)
{
int h, i, j;
int tmp;

for (h = 7; h >= 1; h -= 2) {
for (i = h; i < n; i++) {
tmp = a[i];
for (j = i; j >= h && a[j - h] > tmp; j -= h)
a[j] = a[j - h];
a[j] = tmp;
}
}
}
```

The nice thing about Shell Sort is that, while the increment table is a complete mystery, the algorithm itself is quite simple and well within our grasp.

Let's plug this into our sort program and see how well it does.

I seem to get a pretty consistent 6.3 seconds for 100,000 items on my AMD Athlon XP 2500 system which is almost as good as the results we were getting from our Binary Insertion Sort implementation.

Now for some optimizations. We know that an ideal set of increment sizes will get us down to close to O(n1.5) and that it is unlikely that the [7 5 3 1] set is ideal, so I suggest we start there.

On a hunch, I just started adding more and more primes to our [7 5 3 1] table and noticed that with each new prime added, it seemed to get a little faster. At some point I decided to experiment a bit and tried using a set of primes farther apart and noticed that with a much smaller set of increments, I was able to get about the same performance as my much larger set of primes. This spurred me on some more and I eventually came up with the following set:

{ 14057, 9371, 6247, 4177, 2777, 1861, 1237, 823, 557, 367, 251, 163, 109, 73, 37, 19, 11, 7, 5, 3, 1 }

In order to use this set, however, we need a slightly more complicated method for determining the next value for h than just h -= 2, so I used a lookup table instead:

```static int htab[] = { 14057, 9371, 6247, 4177, 2777, 1861, 1237, 823, 557, 367, 251, 163, 109, 73, 37, 19, 11, 7, 5, 3, 1, 0 };

void
ShellSort (int a[], int n)
{
int *h, i, j;
int tmp;

for (h = htab; *h; h++) {
for (i = *h; i < n; i++) {
tmp = a[i];
for (j = i; j >= *h && a[j - *h] > tmp; j -= *h)
a[j] = a[j - *h];
a[j] = tmp;
}
}
}
```

With this new table of increments, I was able to achieve an average sort time of about 0.09 seconds. In fact, ShellSort() in combination with the above increment table will sort an array of 2 million items in about the same amount of time as our optimized BinaryInsertionSort() took to sort 100 thousand items. That's quite an improvement, wouldn't you say!? Anonymous said...

I've looked at this in a real hurry,so please pardon me if i'm wrong, but I'm not sure if this would work..

for (i = 0; i < n; i++) {
tmp = a[i];
for (j = i; j >= h && a[j - h] > tmp; j -= h)
....
value of j for 1st iteration of inner loop = value of i = 0. The condition j >= h would fail right there since h is 7. And incrementing loop maybe?

for (j = i; j < n-h && a[j + h] < a[j]; j += \$h)
{
swap(a[j], a[j+h]);
}

Jeffrey Stedfast said...

Nah, I just typo'd. The loop should be for (i = h; i < n; i++)

Look at the second implementation to see how it should read.

The first implementation is still correct, it'll just waste cycles until i == h.

Unknown said...