11/12/18
Announcements
2
¨ Submit Prelim 2 conflicts by Thursday night
¨ A6 is due Nov 7 (tomorrow!)
HASHING CS2110
Ideal Data Structure Mystery Data Structure in Your Life
3 4
What do
Data Structure add(val x) get(int i) contains(val x) these data
ArrayList
!(#) !(1) !(#)
structures
2 1 3 0
have in
LinkedList
2 1 3 0
!(1) !(#) !(#) common?
Goal: !(1) !(1) !(1)
AKA add, lookup, search
New Data Structure : Hash Set Intuition behind a Hash Set
5
Idea: finding an element in an array takes constant time
when you know which index it is stored in.
Data Structure add(val x) get(int i) contains(val x) So… let’s place elements in the array based on their
starting letter! (A=0, B=1, …)
ArrayList
2 1 3 0 !(#) !(1) !(#)
LinkedList # of
2 1 3 0
!(1) !(#) !(#) add(“CA”) CA 2
1 st letter
0 1 2 3
HashSet !(1) !(1) !(1)
3 1 2
Expected time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … 25
Worst-case: !(#) b CA MA NY OR PA
# of
contains(“DE”) DE 3
AKA add, lookup, search 1 st letter
1
, 11/12/18
What could possibly go wrong? Hash Functions
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … 25
¨ Requirements:
b AL CA DE FL GA MA NY OR PA
1) deterministic
1 0
2) return a number*
¨ Some buckets get used quite a bit! 1
4 ¨ Properties of a good hash:
¤ called Collisions
3
¨ Not all buckets get used 1) fast
2) collision-resistant
3) evenly distributed
4) hard to invert
* the number is either in [0..n-1] where n is the size of the Hash Set, or
you compute the hash and then % n, constraining it to be in [0…n-1]
Example: hashCode() Can we have perfect hash functions?
9
¨ Method defined in java.lang.Object ¨ Perfect hash functions map each value to a different
¨ Default implementation: uses memory address of index in the hash table
object
¤ If you override equals, you must override hashCode!!! ¨Impossible in practice
¨ String overrides hashCode: ● Don’t know size of the array
s. hashCode() ∶= -[0] ∗ 31456 + -[1] ∗ 31458 + … + -[: − 1] ● Number of possible values far far exceeds the
array size
Do we like this hashCode?
● No point in a perfect hash function if it takes too
much time to compute
Collision Resolution Chaining (1) add(“NY”)
Each bucket is the beginning of a Linked List
Two ways of handling collisions:
1. Chaining 2. Open Addressing
# of
add(“NY”) NY 13
1 st letter
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … 25
b
CA MA NY OR PA
CO
2
Announcements
2
¨ Submit Prelim 2 conflicts by Thursday night
¨ A6 is due Nov 7 (tomorrow!)
HASHING CS2110
Ideal Data Structure Mystery Data Structure in Your Life
3 4
What do
Data Structure add(val x) get(int i) contains(val x) these data
ArrayList
!(#) !(1) !(#)
structures
2 1 3 0
have in
LinkedList
2 1 3 0
!(1) !(#) !(#) common?
Goal: !(1) !(1) !(1)
AKA add, lookup, search
New Data Structure : Hash Set Intuition behind a Hash Set
5
Idea: finding an element in an array takes constant time
when you know which index it is stored in.
Data Structure add(val x) get(int i) contains(val x) So… let’s place elements in the array based on their
starting letter! (A=0, B=1, …)
ArrayList
2 1 3 0 !(#) !(1) !(#)
LinkedList # of
2 1 3 0
!(1) !(#) !(#) add(“CA”) CA 2
1 st letter
0 1 2 3
HashSet !(1) !(1) !(1)
3 1 2
Expected time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … 25
Worst-case: !(#) b CA MA NY OR PA
# of
contains(“DE”) DE 3
AKA add, lookup, search 1 st letter
1
, 11/12/18
What could possibly go wrong? Hash Functions
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … 25
¨ Requirements:
b AL CA DE FL GA MA NY OR PA
1) deterministic
1 0
2) return a number*
¨ Some buckets get used quite a bit! 1
4 ¨ Properties of a good hash:
¤ called Collisions
3
¨ Not all buckets get used 1) fast
2) collision-resistant
3) evenly distributed
4) hard to invert
* the number is either in [0..n-1] where n is the size of the Hash Set, or
you compute the hash and then % n, constraining it to be in [0…n-1]
Example: hashCode() Can we have perfect hash functions?
9
¨ Method defined in java.lang.Object ¨ Perfect hash functions map each value to a different
¨ Default implementation: uses memory address of index in the hash table
object
¤ If you override equals, you must override hashCode!!! ¨Impossible in practice
¨ String overrides hashCode: ● Don’t know size of the array
s. hashCode() ∶= -[0] ∗ 31456 + -[1] ∗ 31458 + … + -[: − 1] ● Number of possible values far far exceeds the
array size
Do we like this hashCode?
● No point in a perfect hash function if it takes too
much time to compute
Collision Resolution Chaining (1) add(“NY”)
Each bucket is the beginning of a Linked List
Two ways of handling collisions:
1. Chaining 2. Open Addressing
# of
add(“NY”) NY 13
1 st letter
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … 25
b
CA MA NY OR PA
CO
2