# Outline of Class 46: Hash Tables

## Miscellaneous

• Have a great Thanksgiving! If, for some reason, you need to reach me over Thanksgiving break, I'll be at (xxx) xxx-xxxx.
• A reminder that the Math/CS "I was a seventies Math Junkie" is Tuesday evening at 7:30 p.m. in the Forum Coffeehouse (check the poster around the department to make sure). Hear all of the not-so-clever things we did as faculty. If you can't make it, I'll be happy to share some of mine electronically.

## Hash Tables

• Surprisingly, if you're willing to sacrifice some space and increase your constant, it is possible to build an expected O(1) dictionary.
• How? By using an array, and numbering your keys in such a way that
• all numbers are between 0 and array.length-1
• no two keys have the same number (or at least few have the same number).
• If there are no collisions, the system is simple
• To insert a value, determine the number corresponding to the key and put it in that place of the array. This is O(1+cost of finding that number).
• To lookup a value, determine the number corresponding to the key and look in the appropriate cell. This is O(1+cost of finding that number).
• Implementations of dictionaries using this strategy are called hash tables.
• The function used to convert an object to a number is the hash function.
• To better understand hash tables, we need to consider
• The hash functions we might develop.
• What to do about collisions.
• Hashing in Java

### Hash Functions

• The goal in developing a hash function is to come up with a function that is unlikely to map two objects to the same position.
• Now, this isn't possible (particularly if we have more objects than positions).
• We'll discuss what to do about two objects mapping to the same position later.
• Hence, we sometimes accept a situation in which the hash function distributes the objects more or less uniformly.
• It is worth some experimentation to come up with such a function.
• In addition, we should consider the cost of computing the hash function. We'd like something that is relatively low cost (not just constant time, but not too many steps within that constant).
• We'd also like a function that does (or can) give us a relatively large range of numbers, so that we can get fewer collisionss by increasing the size of the hash table.
• We might want to make the size of the table a parameter to the hash function.
• We might strive for a hash function that uses the range of positive integers, and mod it by the size of the table.

#### Hash Functions for Strings

• Let's consider hash functions for strings so that we can better understand these parameters.
• Use a numeric equivalent first letter.
• This is fast (one step).
• It doesn't give a very even distribution (there are many more words the begin with S than that begin with Z).
• It only gives a small range (0 to 25, in effect).
• use a numeric equivalent for the second letter.
• This is still fast.
• It also doesn't give a very even distribution (but we might want to check that).
• sum the numeric equivalents of all the letters.
• This is a little bit slower and makes the cost of the hash function dependent on the length of the word.
• It gives a wider range.
• However, it also doesn't give a very good distribution, particularly for longer words.
• Why not?
• Because earlier (A-M) and later (N-Z) letters tend to "even out", giving most words a hash value near length*13.
• multiply the numeric equivalents of various letters by different numbers.
• This is even slower.
• We'd need some care in choosing multiplicands to ensure that it gives a reasonable result.
• It does seem to give a wider range.
• Here's a version that uses the first and third characters
```hashval = alpha * (int) str.charAt(0) + beta * (int) str.charAt(1)
```
• Often, you can model the last with an algorithm
```hashval = 0;
for(int i = 0; i < str.length(); ++i) {
hashval = hashval * x + (int) str.charAt(i);
} // for
```

#### An example

• Let's test a few different hash functions on strings.
• We'll numbers the letters from 0 (A) to 25 (Z).
• In practice, we'd use the ASCII value of the character.
• We'll skip smaller words.
• We'll use a very small table (and mod the computations by that table size)
• Each student will get a short paragraph and should compute hash values for the first ten "non-small" words.
• After computing the hash values, we'll enter the words in a table (on the board).
• We'll use
• First letter
• Second letter
• Sum of letters
• 3*first + 7*second

Outlines: prev next

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.

Source text last modified Sun Oct 18 13:54:48 2009.

This page generated on Sun Oct 18 13:54:50 2009 by Siteweaver.

Contact our webmaster at rebelsky@math.grin.edu