sort : Java Glossary

*0-9ABCDEFGHIJKLMNOPQRSTUVWXYZ (all)

sort

Putting items in ascending or descending order by key fields. The English language meaning of sort is simply to categorise, e.g. to put all your bills into folders by category, e.g. all the electric bills together, all the phone bills together. An internal sort is done strictly in RAM (Random Access Memory). An external sort uses hard disk temporary files. A sort is stable if it naturally preserves existing order when you have duplicate keys. If you want existing order preserved and you are using an unstable sort, you must add an extra sequence number key to your records and use that as a secondary tie-breaking key.

Internal Sort Bad Sorts
Gotchas Ordered Collections
Comparators Descending/Inverse/Reverse Order with reverseOrder
Predefined Comparators Ascending/Descending Indicators
Comparables Generic Sorts
Sorting Speed Complete Example
External Sort Learning More
Merge Sort Links
Tag Sort

Internal Sort

Here is James Goslings’s QuickSort.

In the Java version 1.2 or later there are built-in sorts such as Arrays.sort and Collections.sort See java.util.Comparator and java.util.Comparable for different sorting orders. For String sorting in various international orders see java.text.Collator, java.text.RuleBasedCollator, and java.text.CollationKey.

In

import java.util.Arrays;
//...
// works for:
// int[] myArray;
// Object[] myArray;
// Dog[] myArray;
// byte[] myArray;
// etc.
Arrays.sort( myArray );
or an ArrayList (or any other Collection) like 
import java.util.ArrayList;
import java.util.Collections;
...
Collections.sort( myArrayList );

Note that there is no such method as ArrayList.sort.

Gotchas

Comparators

For more complex sorts you need a class that implements Comparator. Often you can piggyback that code on some other class. The following example does the same as the code above, but could be extended to sort by several fields. To sort in descending order, simply write a Comparator with a compare that returns the negative of the value it does now i.e.
return -( (String )a ).compareTo( (String)b );

If you need a Comparator for sorting Strings in a case-insensitive way, you can use the pre-build Comparator String.CASE_INSENSITIVE_ORDER. It is effectively the same as if you had asked the Comparable/Comparator Cutter Amanuensis to

Predefined Comparators

You don’t always have to write your own Comparator. Oracle provides some predefined ones such as String.CASE_INSENSITIVE_ORDER. This is a Comparator, though it is a constant, namely a static final member of the String class that can only be used to sort Strings.

There is also Collections.reverseOrder (Comparator) which can take any Comparator and create one that sorts in reverse order.

Comparables

If you write a class, you can define its natural sort order, by having that class implement the Comparable interface. Then you can sort arrays or ArrayLists of your Objects without needing to specify a Comparator. Instead of a compare method, an Comparable interface has a compareTo method.

Sorting Speed

There is considerable overhead to set up a sort, of even a couple of elements. So don’t sort every time you add or change an element. Wait till all your changes are done, then do the sort. You can bypass the sort altogether if the size < 2.

In the following table, speed is measured relative to RadixSort on a 100,000 item set of 5-character random keys of the form A9999 where SunSort is arbitrarily assigned 100. Tests done under Java 1.4.1 java.exe on a 233 MHz Pentium 128 MB RAM. Bigger is better.

Sort speed
(bigger is better)
stable? notes
SunSort (Java’s Built-In Sort) 100 yes

This is a most excellent sort. It uses a modified QuickSort for primitives and a MergeSort (sometimes called a tournament sort for objects). The merge sort makes a clone of the array (but not the objects) before it starts, so be aware of the memory usage. There is no reason to use any but this sort except for very large sorts with keys of 4 characters or less where RadixSort might beat it out. Of course you can’t use this in the older Java versions before it was available. Invoke with Arrays.sort and Collections.sort. Beware when using the versions with startIndex and endIndex. The endIndex points one past the last element. Here is how you use it:

RadixSort 86 yes Good for very large sorts, takes same time per item no matter how many items. Ordinary sorts take more time per item as the number of items goes up. More complicated Comparator routine to write. Likes short keys. If you double the length of the key the sort time doubles.
HeapSort 48 no Works best when data are almost already sorted.
ShellSort 34 no Small and quick-loading. Good for small sorts under 2000 items.
QuickSort 14 no Likes random data to sort. Order in the data can cause it to take a very long time to sort. Not good on very small sorts. Varies wildly in time to sort, sometimes a speed demon, sometimes a pig. It just depends on the luck of the draw. I’m told the version I have could be made much better with a few tweaks.

External Sort

If you have too much data to sort to fit in RAM at once, you will need to use an external sort such as OptTech neé Opt-Tech Data Processing. It works by sorting RAM-fulls of records, then doing an N-way merge on the sorted batches, to create fewer larger sorted batches. Eventually it ends up with one large sorted batch.

Abundance uses Op-Tech sort. It rolls itself out to disk, calls in Opt-Tech sort, giving it all of RAM, then rolls itself back in.

You can find various sorts that can handle data too big to fit in RAM by googling for Java big sort. Usually such big sorts use an N-way Merge Sort algorithm.

Merge Sort

A merge sort is used primarily in large sorts where the data are to big to fit in RAM. You sort bundles and write them to disk, using any of the usual internal sorting techniques, but not a merge sort. Then you merge N bundles, reading them sequentially and interleaving them into one bundle N times as long. You repeat until you have one big bundle. To get maximum speed, to eliminate head motion, your scratch intermediate disk files should be on separate physical drives. You tweak N to use RAM/virtual RAM most efficiently. Merge sorts are not particularly good for internal sorts.

You might experiment with a merge sort to use on a multi-core CPU (Central Processing Unit). You split the data into separate datasets and sort each on a separate threads then merge the results in a single pass.

Tag Sort

One problem you often run into is having two arrays you want to keep in sync when you sort one. There are three approaches.
  1. Combine the two arrays into one, with objects having two or more fields. Write a Comparator or implement Comparable for these new combined objects.
  2. Sorts don’t actually sort the records, they just order an array of references. The objects themselves don’t move. If the things you are sorting are not collected nicely together into objects you can use a tag sort. Tag records are short records consisting of the sorting key fields plus a reference back to the original record. This reference may be an array index or a pointer to the original object. The tag records need not contain any data at all, just a way to find it when the comparator is called.

Bad Sorts

There are a few simple sorts you should almost never use since the take time o(n²). In order words they turn into utter pigs when the number of elements to sort gets large.

One is a selection sort:

  1. Select the largest of the N keys and move its record to the output area.
  2. Select the largest of the remaining N-1 keys and move its record to the output area. etc.

Another is the bubble sort, where you pairwise swap elements in a pass through all the elements to percolate the biggest element to the top. Then repeat for the list not containing the biggest elements so far. etc.

A variant of the bubble sort, that is a tiny bit faster is the bi-directional sort that percolates the smallest element to the bottom and the biggest element to the top in alternating direction passes, gradually working toward the middle.

Another is the insertion sort, where you create a new list and gradually add to it, always keeping the new list in sorted order by inserting the new element in the place where it belongs. You find the place by binary search, and slide all the subsequent elements down to make room for it. You sometimes have to use this type of sort when you don’t have all the elements at the start of the sort. They are just given to you one at a time, and you want a sorted list immediately output after each element is added.

Ordered Collections

Instead of sorting, you can maintain a set in order with TreeSet or its brother TreeMap. Whether you go the TreeSet route or the export with toArray and sort route depends on whether you need sorted order only on occasion. If you do, the export method is better. If you need the sorted order maintained up to the second, then the TreeSet method is better. The TreeSet method consumes more memory and more CPU time, and is perhaps a trifle more complex to code. With either technique, it is possible view the same data in several sorted orders by sorting the exported array in different orders or by maintaining multiple TreeSets on the same data objects.

Descending/Inverse/Reverse Order with reverseOrder

Sorting ascending order means sorting with the small elements first then the big. This is usual ordering. Descending order means sorting with the big elements first then the small.

If you have a Comparator or Comparable of some kind, you can convert it into one that sorts into the reverse of the usual order, e. g. if the original sorts alphabetically, the new one will sort in reverse alphabetical order. Here is how you use it:

If you don’t have a suitable base Comparator, just write an ordinary Comparator from scratch and reverse the operands to each compare inside it, or return - result instead of result.

You can’t use Comparators or Collections.reverseOrder to sort arrays of primitives. How could you sort an array of ints in reverse order? Here are some ideas from the mind of Eric Sosman:

Ascending/Descending Arrow Indicators

If you do a GUI (Graphic User Interface), you will want some way of indicating which way your data are sorted ascending or descending. The problem is, no matter what you do you are going to confuse some of you users. Why is this? What does the direction mean to various people?
31 Variations on Ascending/Descending Icons for Sort Order
Name Ascending Descending Size
in Pixels
Votes Notes
Alpha alpha alpha 29 × 27 4 This technique is clear for alphabetic sorts. You need a fairly big icon to make it readable. It would be paired with the numeric icon, also letting you know if you are getting an alpha or numeric collating sequence, or it could be used for both.
Arrow arrow arrow 12 × 16 0 Direction of pointing is opposite to Thunderbird.
Arrow and Bars arrow and bars arrow and bars 31 × 30 1 Arrow and bars.
Bars bars bars 16 × 15 1 The longer bars represent the bigger records.
Caret caret caret 16 × 12 0 Can be kept small. Direction of pointing is opposite to Thunderbird.
Climbing climbing climbing 16 × 16 0 This convention helps break the connection with reading direction, and suggests the climbing association.
Crystal crystal crystal 21 × 19 0 Option for OpenOffice. Open Office allows you to scale the icons to various sizes. Largest variant shown here.
Dice dice dice 22 × 64 0 Bigger numbers on dice indicate direction. Needs to be quite large vertically.
Exclamation ! ¡ 3 × 12 0 This can be done without images, using simple text. ¡ is &iexcl; or '\u00a1'
Dual dual dual 14 × 6 0 Direction of pointing is opposite to Thunderbird. It points up for ascending and gets fatter at the bottom to suggest bigger items are at the end of the list. The user will usually assume the ascending icon is on the left on matter what it looks like. You display a pair. The bright half of the pair indicates the current order. You click the icon anywhere to toggle the direction. You don’t have to click just in the dim triangle.
Galaxy Galaxy Galaxy 22 × 24 0 Default for Open Office. This technique similar to alpha, except the arrow always point down (Tango convention), and the icons are more compact. Open Office allows you to scale the icons to various sizes. Largest variant shown here.
Gnome gnome gnome 32 × 32 0 Like arrow with a side decoration.
Gradient gradient gradient 16 × 16 1 The more intense/darker side of the icon represents the bigger records. The ascending icon suggests the heavier records settling to the bottom.
High Contrast high contrast high contrast 22 × 26 0 Option for OpenOffice. Open Office allows you to scale the icons to various sizes. Largest variant shown here. Works best on a darker background.
Industrial industrial industrial 21 × 19 0 Option for OpenOffice. Open Office allows you to scale the icons to various sizes. Largest variant shown here.
Intellij Intellij Intellij 6 × 16 0 Used in JetBrains Intellij Idea IDE (Integrated Development Environment).
Mac OSX Mac OS Mac OS 7 × 7 0 Direction of pointing is opposite to Thunderbird. It points up for ascending and gets fatter at the bottom to suggest bigger items are at the end of the list.
Music sharp flat 8 × 16 0 Musical sharp and flat. This can be done without images, using simple text if you have a font that supports them. Sharp is '\u266f' and flat is '\u266d'.
Numeric numeric numeric 30 × 28 2 This technique is clear for numeric sorts. You need a fairly big icon to make it readable. It would be paired with the alpha icon, also letting you know if you are getting an alpha or numeric collating sequence.
Ramp ramp ramp 16 × 7 0 Ascending/descending ramp. Ties nicely with the words ascending and descending. Also looks a bit like a graph. Might be confused with a volume control.
Stairs stairs stairs 15 × 12 2 Ascending/descending staircase. Ties nicely with the words ascending and descending.
Tango tango tango 31 × 29 1 Proposed Tango standard. Similar to arrow and bars. In Tango, the arrow always points down. Unfortunately, it takes up a fair bit of screen real estate to be legible.
Text ascending descending 81 × 15 0 ASCII (American Standard Code for Information Interchange) text. Takes up excessive horizontal space.
Thunderbird Thunderbird Thunderbird 8 × 7 0 This is the convention Thunderbird email uses to describe how data are currently sorted. Hard to see on a pale background because it has white elements. What you see here is an improved anti-aliased version of what Thunderbird itself uses. Unfortunately it points down for ascending and gets fatter at the top to suggest bigger items are at the top of the list. I think the intent is either:
  • The arrow points in the direction of the ascending flow of the data.
  • When you click the arrow, it will sort the data in that direction.
Thunder Triangle thunder triangle thunder triangle 16 × 14 0 Could be even smaller. Direction of pointing is same as Thunderbird. Unfortunately it points down for ascending and gets fatter at the top to suggest bigger items are at the top of the list.
Thunder Triangle (small) small thunder triangle small thunder triangle 8 × 7 0 Direction of pointing is same as Thunderbird. Unfortunately it points down for ascending and gets fatter at the top to suggest bigger items are at the top of the list.
Triangle triangle triangle 16 × 14 2 Direction of pointing is opposite to Thunderbird. It points up for ascending and gets fatter at the bottom to suggest bigger items are at the end of the list. The problem with any of the triangle based icons is there are two established contradictory conventions loose in the world. Neither is about to die soon. You are thus guaranteed to confuse people since they must deal with both in the course of a day. That is too bad. The triangle is easy to read even in tiny renderings.
Triangle (small) small triangle small triangle 8 × 7 0 Direction of pointing is opposite to Thunderbird. It points up for ascending and gets fatter at the bottom to suggest bigger items are at the end of the list.
Turtle turtle turtle 30 × 19 0 Descending is associated with inverted. The descending icon has the unfortunate connotation of defunct.
Water Drop water drop water drop 16 × 24 0 The water drop falls down, with the fat part at the bottom.
Vista Vista Vista 4 × 7 0 Microsoft Windows Vista standard. A bit too small to see clearly. Direction of pointing is same as Thunderbird. Unfortunately it points down for ascending and gets fatter at the top to suggest bigger items are at the top of the list. Unfortunately it points down for ascending and gets fatter at the top to suggest bigger items are at the top of the list.

Please let me know at email feedback which one you like best so I can record your vote.

Perhaps some day the association and icon choices will be a configurable part of LAF (Look And Feel) so users will have consistent conventions across all the apps they use. Consistency is the problem. You can find variants by googling view-sort-ascending

I have found the majority up application use an upward arrow to indicate that items are currently displayed in ascending order. No matter what you do, you are going do it backwards for some people, and irritate them. So you might use indicators that don’t have any arrows or at least no confusing up/down arrows. Let the user figure out what your colours or letters A/D mean for example.

You can clarify the meaning of your icons with tooltip help on the icon to say things like:

This still does not resolve the confusion on looking at an indicator when the program the user was just finished using used the opposite convention.

My current favourite is the gradient. It avoids the ambiguity of arrow direction conventions and has a nice physical metaphor. Perhaps I will have to make the icons configurable. Any convention would eventually work, if there were consensus, even something completely arbitrary would work. I am hoping this little essay will help form a consensus and encourage standardisation.

Generic Sorts

If you write a sort for a List, e.g. ArrayList, with proper generics, it will work on collections of any type that supports Comparable or Comparator. To see how to pull it off, have a look at the source for any of my sorts, or Oracle’s sort.

However, because of Java’s lack of orthogonality, your List sort won’t work for arrays of such Objects. You need to write very similar code to do that. Even that array version won’t sort an array of primitives such as long, int or byte. You have to write yet another slightly different version of the sort to handle each type of primitive.

Complete Example

Learning More

Oracle’s Javadoc on Collections.sort : available:
Oracle’s Javadoc on Arrays.sort : available:
Oracle’s Javadoc on Comparable class : available:
Note java.lang.Comparable but java. util.Comparator.
Oracle’s Javadoc on Comparator class : available:
Oracle’s Javadoc on Collections.reverseOrder : available:
Oracle’s Javadoc on Collections.reverseOrder() : available:
Oracle’s Javadoc on Collections.reverseOrder(Comparator) : available:
Oracle’s Javadoc on String.CASE_INSENSITIVE_ORDER : available:

This page is posted
on the web at:

http://mindprod.com/jgloss/sort.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\jgloss\sort.html
logo
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.
no blog for this page
IP:[65.110.21.43]
Your face IP:[54.163.72.86]
You are visitor number