Hashing in Java:
What is hashing ?
Hashing is a principle/algorithm to generate the unique code(hash code) to store the
objects/keys. Hashmap in Java uses hashing principle only to generate the hash code which
helps to store and retrieve the keys and objects.
What is hashcode ?
Hash code is an unique code generated by the hashing principle with some algorithm/rule
applied on its properties.
Before going further it is recommended to keep these rule/property in mind strongly.
1. If two objects are equal then both will have the same hashcode.
2. But if two hashcodes are same, not mandatory that both the objects should be equal.
Only for rememberence:
Objects = Books
hashcode = Author
if two books are same, surely author of the book also will be same.
But if two book authors are same, book written by the author need not be same.
[Means author would have written many other books also]
[It’s just for better understanding and remembrance, so do not use/tell in interviews]
What is hash function ?
It is a function which generates hash code for the given key/object using any algorithm. It
returns a number for any given object/key.
What is the challenge ?
We should generate a unique hash code all the time using hash function to avoid the
collision. But it is very tough to achieve and it depends on the number of elments in the
How hashmap works in java collection?
Map internally uses an array with the fixed size, so if we have inserted the number of
elements which exceeds the array size then hash function will generate the same bucket
location for two different keys/objects.
[Remember the rule/property 2 given above “But if two hashcodes are same, not mandatory that both the objects should be equal.”]
In this case, collision will occur. so in order to avoid the collision hashmap will have linkedlist in the array.
hashcode -> linkedlist1 | linkedlist2 | linkedlist 3
here, same hashcode generated for different keys, so values are stored in the linkedlist
with the keys. So inorder to identify the exact element then we have to use the
keys.equals() method to generate and match the correct linkedlist values. Because two
hashcodes are same for different objects/keys getting with hashcode can not be done.
put(key,value) works this way,
HashMap<String,String> jdMap = new HashMap<String,String>(); jdMap.put("BlogName","Javadomain.in"); jdMap.put("Author","Naveen");
hashing algorithm/principle is used for hashcode generation for hashmap keys.
hashcode will be generated by hashCode() method for the key (“BlogName“), then with the generated hashcode BlogName and Javadomain.in will be stored in the internal array/linkedlist of the array.
Same way again hashcode will be generated by hashCode() method for “Author” key, then in the respective address Author and Naveen will be stored in the array/linkedlist of the array.
Which is to
provide the better performance by constant retrieval time (O(1)),
O(1) –> It is nothing but, providing the way to retrieve the value by passing the key all
the time directly or once.
It can be achieved only if we have generated the unique hashcode for all the keys in the map. But we know very well, that if two hashcodes are same then it is not needed to have the objects also same. In this case collision will occur.
get(key) works this way,
when you pass a key to get the value in the hashmap, hashmap internally uses hashing
function to generate the hashcode of the given/passed key, with the generated hashcode it
will point out the bucket location which helps to get the value.
For our case,
get(“Author”) => hashcode will be generated for “Author” key and with this generated hashcode value you can directly retrieve the value “Naveen” from the array of the hashmap.
If suppose collision were occurred for the particular hashcode and the map values are stored in the linkedlist, then
get(“Author”) will return many values then key.equals() method will be used to identify the correct value.
Think, if two hashcode matches in the map, then we need to use the keys.equals() method to find the correct object/value. Because linkedlist is used internally inside the array of
the hashmap to avoid the collision.
What is collision ?
Collision is the duplicate hash code generation for two different objects/keys.
As explained above, if we keep on storing the objects/keys in hashmap then we may get the same hash code for two different objects/keys, since hashmap internally uses an array with the fixed size.
As per the rule 2 above,
“But if two hashcodes are same, not mandatory that both the objects should be equal.”
since two different objects can have the same hashcode, retrieving the objects with the
hashcode may not able to identify the correct value.
So along with the hashcode it is mandatory to provide the check for the key keys.equals()
method to identify and retrieve the correct object/value from the linkedlist of the array
which exists in map.
Perfect hash function vs Imperfect has function ?
Perfect hash function will generate the unique hash code for the given key/objects all the
time. If it can generate the unique hash code all the time, then retrieval will be done at
constant time always [O(1)].
Imperfect hash function will fail to generate the unique hash code and same hash code may be generated for two different keys/objects. So finding the stored bucket location will be tough.
Keep always Immutable objects as key in map.
As explained, we will generate the hashcode of the map key and store with the key. so if
the key is mutable object(say stringbuffer or stringbuilder) then can change anytime, then we will get the different hashcode so retrieval may become wrong/invalid.
So it is highly recommended to use only the immutable objects(say String) as a key in the
Hashing in Java 8:
In java 8 it works the same way as discussed above, but the main difference is, if suppose
linkedlist also created to the threshold value(or high hash collision happens) in the same
hash code array bucket. Then performance will not be consistent all the time and it depends
on the given key.
So in Java 8, they provided the way to switch the linked list of entries to balanced tree
after it reaches the high hash collisions. which provides the performance of o(log n)
instead of o(n) for high hash collisions at very worst cases.
832 total views, 4 views today