Sunday, February 27, 2011

Optimizing HashMap and HashSet with load factor


                                  Java Collection interface is one of the most used interface in java, it allows developers to choses variety of flexible data structures. Among those data structures one of the mostly using data structure is HashMap and HashSet as well.The question is are we getting full usage from that most commonly used data structure . This topic will cover some of the main details that you may have forgotten about those two data structure. And keep in mind I'm not going to discuss about when to use those two or what are the functionalities of this two in this artcle.
                              Before we going to the topic it is better to know what are the back ground of those two data structures. Basically HashMap and HashSet are derive from one of the most conman data structure called HashTable so it is better to know why we implement HashTable type data structures. The reason for that is speed of searching particular element . In hashing we try to search a element in a constant time “O(1)” . And the load factor of HashTable directly effect to the searching speed 

                               number of stored elements in the table
      load factor =----------------------------------------------------------------------
                                            size of the hash table 

                            If we able to maintain load factor from range of .7 to .75 we are able to search element form the constant time. But keep in mind the load factor is not the only thing that effect to the searching speed of a element in the hast table.

                           Now we check out how this load factor combine with HashTable and HashMap. In the Hashtable documentation they have mentioned how much load factor we have to maintain to get optimal time for searching a element and they strongly recommended to maintain load factor to 0.75
(check java doc for HashMap and HashSet)

So now its is developers responsibility to select a proper intercalation size for the HashMap or HashTable we can do that easily by using constructor
example
Assume there are 750 Person details


Now we can assume we will get O(1) searching time for a finding person from the data structure.

What happen if load factor exceed than initialize? 
Both HashMap and HashSet behave in same way so im using HashMap as example 

HashMap()
          Constructs an empty HashMap with the default initial capacity (16) and the default load factor (0.75).
HashMap(int initialCapacity)
          Constructs an empty HashMap with the specified initial capacity and the default load factor (0.75).
HashMap(int initialCapacity, float loadFactor)
          Constructs an empty HashMap with the specified initial capacity and load factor. 

So every time we exceed the number of element that can have in the data structure with respect to the load factor , JRE restructure the map according to the load factor .This restructuring is little bit expensive so its better to use this feature carefully .by doing this we can improve the performance of the HashMap.

What happen when load factor is 1?
This is the easiest way to avoid restructuring of the HashMap before it. If declare load factor as
1 the restructuring process only happens when number of elements are exceeding the capacity of the HashMap.

Note : Some may have a question about the wasted space when we use this type of declaration . As you can see if we use 1000 as a initial size we know that 250 memory spaces will be wasted but cost of wasting me space is so cheaper than wasting computational time so it is better to have this kind of declaration. But keep in mind this is not good for every situation and it is developer's responsibility to identify what will be used. As an example if we have no idea about how many data items are to be inserted to the data structure don't initialising when construct.

                That's about how load factor effect to the searching time . But in above I've mentioned that the load factor is only one factor that effect to the performance of the HashMap and HashSet . The initial size of the data structure is also effect to the performance and it will discussing on next article because it also little bit of lengthy topic and we need more knowledge about Hashing for it so it is advise to read more about a hashing technique called bucket hashing
 


 

 

3 comments:

  1. how is load factor effect to the searching time ?

    ReplyDelete
  2. Thanks for the info.

    ReplyDelete
  3. Google Guava Maps and Sets have a method that constructs a non re-sizing collection for you. See newHashMapWithExpectedSize - http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/Maps.html#newHashMapWithExpectedSize(int)

    ReplyDelete