For Strings, Hashtables work pretty much as you would expect if you are familiar with hashing. The only catch is remembering to import java.util.Hashtable (with a small t in table). Here is how you use
The key point for novices to remember is the capacity of the table has to be somewhat larger than the number of elements you plan to look up. The more spare capacity you give, the faster the lookup. There are of course diminishing returns. Typically, you specify about 25% more capacity than you have elements and a load factor of 0.75 which controls automatic growth of the lookup table when it gets too full. More details on that later.The capacity of a table is not the number of elements you plan to add! If you specify a capacity of 100 and a load factor of .75, Hashtable will allocate only 100 slots, not the 133 you need for efficient access. The load factor kicks in only when deciding when it is time to double the size of the table, in this case, just after you have added 75 elements, not in deciding its initial size. It is up to you to leave some breathing room.
Now it is more important than ever to have a high quality hashCode function that has good distribution, especially in the low order bits. It must not favour one bit pattern over another, e.g. not tend to generate hashcodes all ending in binary 0000. In JDK (Java Development Kit) 1.4.1+ there in a rehash function used inside HashMap to improve the quality of your provided hashCode by redistributing the bits. rehash has to be quick, or they might as well have used tables whose lengths are multiples of primes and used the modulus operator.
In theory though, you should round your capacity up to prime number. However, avoid 31 or multiples of 31 since the new hashCode algorithm for Strings in Java version 1.2 or later uses 31 as a magic number in the hashing formula. In a Hashtable, you are only using 4 bytes per additional entry. Here are some primes just larger than various powers of two that would be good choices.
| 11 | 17 | 37 | 67 | 131 | 257 | 521 | 1031 | 2053 | 4099 | 8209 | 16411 | 32771 | 65537 | 131101 | 262147 | 524309 | 1048583 | 2097169 | 4194319 | 8388617 |
To get the first duplicate, before you put, do a check to see if the element is already in the Hashtable, if so, do nothing.
To get the last duplicate, just put. It will replace what was there before. To get both, store an ArrayList as the value. Store each of the duplicates in one of the slots of the ArrayList. This is a bit inefficient for non-duplicates. For them, you could store just the plain Object. Any you might handle the special case of just one duplicate with an array. Your logic might look a bit like this:
Of course are free to simplify the algorithm, going straight to ArrayLists even for singles. My complex recipe is designed to conserve RAM on large lists.
|
|
available on the web at: |
http://mindprod.com/jgloss/hashtable.html |
optional Replicator mirror
|
J:\mindprod\jgloss\hashtable.html | |
![]() |
Please email your
feedback for publication,
letters to the editor, errors, omissions, typos, formatting errors, ambiguities, unclear
wording, broken/redirected link reports, suggestions to improve this page or comments to
Roedy Green :
| |
| Blog | Canadian
Mind
Products
IP:[65.110.21.43] Your face IP:[50.16.132.180] |
|
| Feedback | You are visitor number 223,671. | |