RadixSort : Java Glossary
is a sorting technique that mimics the old card sorters that worked column by column
sending punched cards into different bins. I first heard of the idea from the late
Tom Meikle who was a novice programmer when he independently discovered it back in
1968 when we implemented it for the venerable LGP-30. He
called it DigitSort. Only many years later after teaching
it to scores of people did I learn it was already well known by the name RadixSort.
The radix-distribution sort (which I call here simply
RadixSort) is discussed at length in Knuth’s Art of Computer Programming vol 3
#5.2.5, where he says that it was first published in 1929
in connection with tabulating machines, although it was probably known to the
operators before that and first described in the form included here in 1954. There is also a related radix-exchange sort.
In Java, RadixSort is faster than either
QuickSort or HeapSort in the benchmarks I ran. For small numbers of items,
almost any sorting algorithm will work just fine. There is rarely any reason to use
anything but the built intournament sort, unless the sort is too big to fit in
RAM (Random Access Memory), where you need something like opt-sort.
RadixSort is stable, meaning it preserves existing order of equal keys. It works
in linear time, unlike most other sorts. In other words, it does not bog down when
you have large numbers of items to sort. The time per item to sort is constant. With
other sorts, the time to sort per time increases with the number of items.
Mathematicians would put it that most sorts run in O(n
log(n)) or O(n²) time, where RadixSort runs in O(n) time.
- RadixSort does not work well when you have
very long keys because the total sorting time is proportional to key length and to
the number of items to sort. If you insisted that all records have unique keys,
necessarily the keys would have to be at least log₂(n) bits long, which would tend to defeat the advantage of
RadixSort. RadixSort shines when you have large
numbers of records to sort with short keys.
- Unfortunately, in this particular implementation, 16-bit characters count as two key slots, unless you can guarantee
you never user the high order parts. In other words Unicode takes twice as long to
sort as byte arrays
- The biggest drawback with RadixSort is that
you have to write an unconventional compare routine.
- The other drawback is RadixSort also uses
somewhat more working storage than a traditional sort because it copies references
to the elements to sort back and forth between two arrays for each pass. Many other
sorts make do with one array and some stack space.
If you want a high speed version of RadixSort for a particular application, you will have to inline
the comparison delegate code since delegate.getKeyByteAt,
is where it chews up all the CPU (Central Processing Unit) time.
Since all sorts can use the same Comparator
interface, it is possible to experiment with various sorts to figure out which one
works best for your situation.
There are other ways to implement the same basic algorithm by
chaining the objects to be sorted together, but that tends to be inefficient in
Java since you need separate glue objects to do
the linking unless the items to be sorted themselves implement some chaining