CSC153, Class 52: Implementing Dictionaries with Hash Tables
Overview:
* Arrays vs. Dictionaries
* Review: Running time of first dictionary implementations
* Key Idea: Hashing
* Writing hash functions
* An example
* Hashing in Java
* Handling conflicts
* Removing values
Notes:
* Send me your extra credit summary.
* Questions on the exam?
* Four people are about to lose their chance at an A
b/c part of your grade is on attendance.
Mostly a joke
----------------------------------------
How are arrays and dictionaries similar?
* Both are indexed collections.
How are arrays and dictionaries different?
* Different kinds of indexes (numbers vs. "anything")
+ Numbers are sequential.
+ Arbitrary objects are not.
* With arrays, you expect O(1) add and get
Two implementations of dictionaries:
* As a list
add: O(1)
get: O(n)
* As a sorted array
add: O(n)
get: O(logn)
Cool ideas:
* We need to use arrays, since they're the only data structure
(other than the linear structures, and they're clearly no
help) that has O(1) add and get.
* So ... suppose we could convert every object to a number.
+ Note: We're turning keys into numbers
* We could build a ginormous array. Lots and lots of space.
* We could build a smaller array and convert the object numbers
to valid indices for that array.
+ Mod by the array size to get the index.
* In either case, we have the problem that two unequal objects may
have the same index.
* Goal: Choose numbers (and array size) in such a way that
such conflicts are uncommon.
Resolving conflicts:
* Put a list or array in each cell of the array
* Nested hash tables
* Keep stepping through the array until you find an empty
space.
If we use the second technique and make the array size twice the
number of elements, on average, we only need to look at two
cells to find something or reach a space.
Terminology:
* These implementations of dictionaries are called "Hash tables"
* The function used to compute numbers/indices is called a
"Hash function"
By throwing lots of memory at the problem, we get "likely"
O(1) add and O(1) get.
Designing good hash functions can be hard.
Consider: Strings
* Convert each letter to a number
* Sum the numbers
What makes a hash function "good"?
* Few conflicts
* Fast
Suppose we use
A: 1
B: 2
C: 3
D: 4
E: 5
G: 6
...
Figure out the hash value of your full first name
And of the first four letters of your first name
SAMUEL = 19 + 1 + 13 + 21 + 5 + 12 = 71
SAMU = 19 + 1 + 13 + 21 = 53
Size 16 array:
0 (64,Arjun)
1
2 (50,Arju)
3
4 (36,Davi)
5 (53,Shobha) (85, Katherine)
6
7
8 (40,David)
9
10
11 (27,Kath)
12 (44,Brian) (44,Shob)
13
14 (62,Cassandra) (30,Bria) (30,Ogec)
15 (47,Ogechi)
Names of length N should group around 13.5*n
Solution: Choose a better distribution of numbers (e.g.,
the primes)
A: 2
B: 3
C: 5
D: 7
E: 11
F: 13
G: 17
H: 19
I: 23
J: 29
K: 31
Hashing in Java
* If you design a class, you're expected to write a hash function
for that class.
* Goals of that hash function:
Two equal objects must have the same hash value
The same object must have the same hash value
Two different objects should have different hash values
Good practice:
* If you expect someone might use your object as a key,
write hashCode
* If you expect someone to compare your objects, write
equals
* If you write equals, write hashCode