CSC297.Java, Class 37: Hash Tables, Concluded; On to Dynamic Programming Admin: * Questions on homework? * Sam needs to find good GUI references for Alex and Yvonne * Sam needs to teach dynamic programming on Friday Overview: * Some more notes on Hash Tables * Reducing conflicts * Other operations * On to dynamic programming * The approximate string matching problem * On to graphs * What is a graph * Common graph algorithms * Representing graphs: ADT * Implementing graphs Homework questions: Implementing AVL trees * What methods: Constructor, add, delete, find, size, isEmpty, dump (for testing) * Should we use BST in this at all? * No. You might use BST as a template for your code, but that's about it. * Should we use BSTNodes? * No. Your nodes should keep track of their height (expensive to compute on the fly; relatively cheap to compute during addition and deletion). Your nodes might also keep track of their "balance" (balanced, unbalanced by 1 to left; unbalanced by 1 to right). You might also consider keeping track of the size of the subtree rooted at that node. * Can we use balance rather than height? * If you are clever and keep track of only balance, you can *probably* do without height. However, I would not recommend it. * What do you mean by "unbalanced by 1 to the left" * Assume that the number in parentheses is the height of the tree A / \ is unbalanced by 1 to the left B(n) C(n-1) A / \ is unbalanced by 1 to the right B(n) C(n+1) * Are these trees balanced? * Not completely, but sufficiently * Can we test BST.java? * Yes. Yvonne failed to copy the updated ComparePersonByName.java * What is going to be on the final? * Everything * Probably a "write an algorithm" * Probably a "design an ADT" * Probably a "suggest how you might implement this ADT" * Probably a "solve this recurrence relation" * Probably a "analyze this algorithm for big-O running time" * Probably a "regurgitate some facts you learned" (e.g., "What is a dictionary?") * Suppose this is the pattern of operations in your program. What data structure should you use? * Explain inheritance in Java. * Explain polymorphism in Java. * Explain encapsulation. * Partially information hiding: an object should provide access to its methods without providing direct access to its data. (permits author to modify representation without affecting use) "Separate your interface from your implementation" (What it should do from how it does it.) * Partially logical grouping: group together natural sets of methods and data * Write chapter 5 of the book. * Knowing me, probably a "debug this code" * Possibly a "write your own question; you will be graded on the quality of your question as well as the quality of your answer" * You may bring one double-sided 8.5x11 inch handwritten set of notes Review of Hash Tables * What is a dictionary? * Collection of stuff that is indexed by string. * In java.util.HashTable, a collection of stuff indexed by object * What is the key idea in hash tables? * Use an array for fast access * Convert each key to a number, mod by the size of the array, and use that as the index of the key/value pair. * In Java, each class is expected to provide a hashKey() function to help you write hash tables. * All the cool built-in classes have such a function. * Potential problems: * If your class of values is sufficiently big, you are guaranteed to have conflicts * When you mod by the size of the table, you are likely to have even more conflicts * We've decided to handle both conflicts by putting a list (or binary search tree) in each cell in the array * Running time * add: time to compute the number + time to add to the data structure in the cells * If we design our hash function well, compute the number in O(1) * For strings, it's really O(length of string), but that is still independent of the number of things in the hash table * How long does it take to add to a cell in an array? (Suppose no duplication.) O(1) * Danger: What if there are a lot of things in one cell? (Worst case: Everything has the same hash value.) If we're using an array, this is O(n). * delete: * Similar * find: * Similar * replace: * Similar * As designers, we need to find a way to make sure that there are never too many things in one cell. * If so, everything is *constant* time. Way cool! * Solutions? Make the hash table fairly big. In particular, if the hash table is about twice as big as the number of elements, you are *unlikely* to have many duplicate elements. Or ... when any cell gets more than K elements (choose the K yourself), make a bigger hash table and move everything over * Time/space tradeoff * Sam's observation: If you keep the number of operations in your data structure small, you are more likely to implement it efficiently. * Something that needs only addtofront and removefromfront is much easier to implement than something that also needs removefrommiddle * Hash tables provide another good example. In general, well-designed hash tables provide O(1) for all the key operations. * Suppose you also needed "find smallest" * Need to look at *every* element. * O(n + m) where n is number of elements and m is size of underlying array Definite exam questions * A recurrence relation. (15 minutes) * An "analyze the running time of this algorithm" one. (15 minutes) * A "correct this code" one. (30 minutes) Potential exam questions (20 minutes each); 3 of the following * As I write the Tao of Java, should I put the section on linear structures (queues and stacks) before or after the section on lists? Why? * What topics do you expect to see covered in a chapter on linear structures? * Tell me everything you know about list cursors * Suppose instead of using arrays/lists/etc. in each cell of a hash table, we instead use the policy that "when adding, you keep looking until you find a free cell." (Here's the implementation.) How would you implement delete? * Reflect on what you learned about graphs in Combo. Design a graph ADT. * Explain why there is no body in the Stack interface. * Here's an algorithm that uses dictionaries. (Sorry, you don't get to see it until the exam.) Which of the implementations of dictionaries would you use? Why? * Unsorted list of key/value pairs. * Sorted list of key/value pairs. * Binary search tree. * AVL tree * Hash table * Write and answer your own question. You will be graded on the quality of the question as well as the quality of the answer.