CSC153, Class 52: Implementing Dictionaries with Hash Tables Overview: * Arrays vs. Dictionaries * Review: Running time of first dictionary implementations * Key Idea: Hashing * Writing hash functions * An example * Hashing in Java * Handling conflicts * Removing values Notes: * Send me your extra credit summary. * Questions on the exam? * Four people are about to lose their chance at an A b/c part of your grade is on attendance. Mostly a joke ---------------------------------------- How are arrays and dictionaries similar? * Both are indexed collections. How are arrays and dictionaries different? * Different kinds of indexes (numbers vs. "anything") + Numbers are sequential. + Arbitrary objects are not. * With arrays, you expect O(1) add and get Two implementations of dictionaries: * As a list add: O(1) get: O(n) * As a sorted array add: O(n) get: O(logn) Cool ideas: * We need to use arrays, since they're the only data structure (other than the linear structures, and they're clearly no help) that has O(1) add and get. * So ... suppose we could convert every object to a number. + Note: We're turning keys into numbers * We could build a ginormous array. Lots and lots of space. * We could build a smaller array and convert the object numbers to valid indices for that array. + Mod by the array size to get the index. * In either case, we have the problem that two unequal objects may have the same index. * Goal: Choose numbers (and array size) in such a way that such conflicts are uncommon. Resolving conflicts: * Put a list or array in each cell of the array * Nested hash tables * Keep stepping through the array until you find an empty space. If we use the second technique and make the array size twice the number of elements, on average, we only need to look at two cells to find something or reach a space. Terminology: * These implementations of dictionaries are called "Hash tables" * The function used to compute numbers/indices is called a "Hash function" By throwing lots of memory at the problem, we get "likely" O(1) add and O(1) get. Designing good hash functions can be hard. Consider: Strings * Convert each letter to a number * Sum the numbers What makes a hash function "good"? * Few conflicts * Fast Suppose we use A: 1 B: 2 C: 3 D: 4 E: 5 G: 6 ... Figure out the hash value of your full first name And of the first four letters of your first name SAMUEL = 19 + 1 + 13 + 21 + 5 + 12 = 71 SAMU = 19 + 1 + 13 + 21 = 53 Size 16 array: 0 (64,Arjun) 1 2 (50,Arju) 3 4 (36,Davi) 5 (53,Shobha) (85, Katherine) 6 7 8 (40,David) 9 10 11 (27,Kath) 12 (44,Brian) (44,Shob) 13 14 (62,Cassandra) (30,Bria) (30,Ogec) 15 (47,Ogechi) Names of length N should group around 13.5*n Solution: Choose a better distribution of numbers (e.g., the primes) A: 2 B: 3 C: 5 D: 7 E: 11 F: 13 G: 17 H: 19 I: 23 J: 29 K: 31 Hashing in Java * If you design a class, you're expected to write a hash function for that class. * Goals of that hash function: Two equal objects must have the same hash value The same object must have the same hash value Two different objects should have different hash values Good practice: * If you expect someone might use your object as a key, write hashCode * If you expect someone to compare your objects, write equals * If you write equals, write hashCode