CSC 213, Fall 2008 : Schedule : Lab 8
Goals: In this lab you'll discover just how useful the principle of locality is by experimenting with the effects of caching. As an added benefit, you'll also investigate the "Small World Phenomenon" to find the six degrees of Kevin Bacon.
Background reading:
Collaboration: You will complete this lab in pairs, as assigned by the instructor.
Overview: You will explore the performance of two versions of a matrix multiplication algorithm: one designed to minimize cache misses (utilizing locality), and another designed to maximize cache misses (thwarting locality). Using a variant of matrix multiplication, you will then investigate whether the "six degrees of Kevin Bacon" hypothesis actually holds.
This header file defines the matrix type as a struct matrix_t. Note the fields are simply an array (actually, a pointer to a memory address) *data, and the dimensions of the matrix (rows and cols).
We will assume that the matrix is stored linearly in memory in row-major order. This means that rows of the matrix are stored contiguously. Thus, as we walk along the linear indices of the array, the column index changes the fastest, while the row index changes slowest.
For instance, the following macro would convert a row and column matrix subscript indices to a single linear index into the array:
#define SUB2IND(row,col,numcols) (row) * (numcols) + (col)
We could also change the type of the data stored in the matrix at compile time by changing the value defined by MTX_TYPE. To save on memory for our experiments, we'll use the unsigned short type given here.
This file contains one function, mtxAdd, that takes pointers to two matrices (struct matrix_t*) and stores their computed sum in a third matrix.
To compute this quickly, without doing any array indexing arithmetic, pointers to the matrix data are used. You will write functions that compute and store matrix products. To maximize efficiency, you will also be asked to use pointers to the matrix data.
You won't need to understand all the details of these routines, but they will be helpful to have handy for implementing part B of the lab. Read the comments in matrixutil.h for descriptions.
If A is an mxn matrix and B is an nxp matrix, the definition of matrix multiplication C=AB is given by,
,Since all m*p entries of C must be set, and each of these requires a sum over n values, a simple analysis shows that multiplying two matrices using a direct method such as this is O(mnp), or O(n3) if m=n=p (as with square matrices).
The equation suggests an algorithm that walks along the indices of C, and at each location computes a sum over products. However, more than one ordering of the operations will still give the correct result. Mathematically, they're equivalent. Why on earth would you choose one over the other? Do part A and find out!
Add two different functions to matrixop.{h|c} that each calculate a matrix product. One should be written to minimize the number of cache misses, while the other should be written to maximize the number of cache misses. Both functions should use pointers for all matrix data references (both reading and storing).
If you are thinking of multiplication as a triply-nested for loop (which you probably should be), there are six basic ways you could implement a matrix product. Think about how the data is stored in memory and try to maximize (or minimize, respectively) the number of contiguous accesses (read or write) you make in the computation.
Unless you and your partner are very sure of your pointer programming (and debugging!) skills, you may want to write at least one of these using array indices first, to make sure you've got the right idea. (You can use the macro above to convert subscripts to linear indices.) Once you've done that, it might be somewhat easier to to write a version using pointers.
Write a program that takes n as an argument, and creates two nxn matrices filled with random values (see the function mtxRand(...) in matrixutil.h). Like in Lab 3, you will use the system call getrusage(2) to query the resource usage caused by your matrix multiplication routines.
Use fork(2) to create a child process that calls the appropriate multiplication function (be sure any initialization is done before the fork). The parent process should query the resource usage of the child after the multiply. Your program(s) should report:
The total user time (in seconds)
The total system time (in seconds)
Run your program above for values of n = [2, 20, 50, 100, 200, 500, 750, 1000, 1250, 1500] using both multiply functions. Collect the total compute time for these and order them nicely in a table or use some plotting software (e.g. a spreadsheet, gnuplot, MATLAB, etc.) to give time versus matrix size n for both methods (on the same axes if plotting, side by side if in a table). In addition, you should also give (or create a plot of) the ratio of the two times (e.g., timeMax_Cache_Miss_Mult /timeMin_Cache_Miss_Mult).
Does the big O() analysis given above for matrix multiplication seem to hold for both curves? Explain your answer.
How would you characterize the difference (if any) between the curves (qualitatively and/or quantitatively)?
What is your explanation for any difference between the curves?
A little more matrix and/or graph review (or brief primer, if this is new for you).
A graph is comprised of a set of "nodes" with some "edges" between them. As a simple example, all the cities United Airlines flies to are "nodes,", and there are edges between the cities that they fly between. There are innumerable uses for graphs that crop up all the time in computer science. Most problems under consideration nowadays features graphs that are HUGE and processing them requires some clever thinking on the part of theorists and large-scale programmers (like you).
Formally, a graph G=(V,E) is a set of nodes V={0,1,2,...n} and a set of edges E = { (i,j) | i and j are in V}. This set of edges is simply a bunch of pairs of nodes. In a computer, we might represent it as a list of pairs. Alternatively, we can use a matrix (*gasp!* I bet you never saw that coming) that simply indicates the presence of an edge between two nodes. if E is an nxn matrix, then ei,j=1 means there is an edge between nodes i and j, and ei,j=0 means there is no edge. We call E an adjacency matrix.
For simplicity, we're only going to consider so-called "undirected" graphs here, which for you, means that the adjacency matrix is symmetric, or ei,j = ej,i.
Matrix multiplication between adjacency matrices has a special interpretation. If we consider that ei,j = 1 means there is a direct path from i to j, then if the product of two elements of the adjacency matrix ei,k * ek,j = 1, it means there is a path we can walk from node i to node j. That path requires at most two hops and it goes through node k (by traversing the edge from i to k and then the edge from k to j).
Since E contains binary values, we may think of the times (*)
operation as a logical AND, and the plus (+) operation as a logical
OR. Thus, if we replace every term in the right side of the matrix
multiplication equation above with an e (rather than a
or b), the product C =
E2 = EE can be interpreted (in
English) as "there is a path of length at most two from i to j IF:
Of course, since we're adding up integer types masquerading as booleans, we're only interested in whether the result is zero or not. If it's zero, there is no path. If it's non-zero (it will be positive), then there is a path.
The same rule holds for higher matrix powers and paths as well. That is, if the product E2 is already stored in a matrix and we calculate another product C = E3 = E2 * E, then the ai,k terms in the matrix multiplication equation above correspond to the square of the adjacency matrix, while the bk,j terms correspond to the adjacency matrix itself. If you follow the same interpretive logic, this third matrix power tells us whether there is a path from one node to another of length at most 3. You can go on like this forever (well, eventually you'll stop learning anything new, but you could certainly keep trying to get from one place to another in a graph.)
This leads us to some analysis we'll do to see if there is a path of length six from one very special node to another in a graph.
In the interpretation given above, all we are interested in is whether there is some path between nodes. Thus, if we find a product ai,k * bk,j that is non-zero, we know there is a path and we know that ci,j is non-zero and do not need to calculate any more terms for ci,j.
Write one more special "multiplication" function that re-orders your loops to still minimize cache misses as much as possible, yet can now take advantage of the "early stopping" logic described above by breaking out of a loop once a positive product ai,k * bk,j makes ci,j positive.
The efficiency of your new "multiplication" function will depend on how early in the computation it sees a non-zero product. This depends on the matrix itself and not how efficiently you have done your coding. However, your original cache-miss minimizing matrix multiplication function will still correctly compute the quantity of interest. Perhaps, even, it will be faster than "early stopping".
Why could your "early stopping" multiplication routine take longer than your original multiplication function?
As discussed above, we can find whether there is a path of length at most L between two nodes i and j in a graph by taking the Lth power of the adjacency matrix: EL and testing whether the entry at (i, j) is non-zero.
Assume that it will be faster to use your original cache-miss minimizing function to take the first product of the adjacency matrix with itself (giving you E2), but that subsequent products with E will be faster using your "early stopping" multiplication function.
With this assumption, write a function that computes the Lth "power" of such a matrix. We say "power" in quotes because it is only a boolean power -- the only meaningful trait is where the non-zero values are, not what the non-zero values are.
The files imdb2000.dat, imdb3000.dat, imdb6000.dat, and imdb.dat contain adjacency matrices from data in the Internet Movie Database. Each node is an actor, and there is an edge between actors if they have appeared in a movie together. In each file, the first node represents Kevin Bacon. Each of these (save the last) contains data on a subset of actors from IMDB that have appeared in at least 25 movies. The lattermost contains all actors from IMDB that have appeared in at least 25 movies.
If you are curious about which nodes in the graphs correspond to which actors, you can look up their names by numerical index in the files imdb2000.txt, imdb3000.txt, imdb6000.txt, and imdb.txt. These files list the (one-based) row/column and the corresponding actor ID, which you can then search for in actors.txt to find the corresponding name.
If you use fopen(2) to open the files for reading, you can pass the file stream to mtxRead (from matrixutil.h) to read the matrix to a struct matrix_t.
Use your function from B.3 to compute the 6th power of the largest IMDB actor matrix you have the time or patience for. Counting the number of zeros in the first row of the result (Kevin Bacon's row), will tell you how many, if any, actors are not within the "six degrees of Kevin Bacon."
How many are there? Report which matrix you used.