Lab: Parallel Data Structures

Assigned:
Tuesday, Feb 27, 2018
Due:
Tuesday, Mar 6, 2018 by 10:30pm
Collaboration:
Work with your assigned partner for this lab. You can use online resources for your lab, provided they do not provide complete answers and you cite them in your code. If you do not know whether it is acceptable use a specific resource you can always ask me.

Overview

In this lab, you will implement three different abstract data types that support concurrent access by multiple threads and write tests to verify that your data structures work correctly under concurrent modification. You will implement a stack, queue, and associative array. You will then test these implementations using the Google Test framework.

Unlike in previous labs, a repository will be created for you to hold the code for this lab. Your repository will be at http://github.com/csc213/parallel-data-structures-USERS, where USERS is replaced by a dash-separated list of all groups members’ Grinnell usernames. Please work directly in this repository, not in a fork of the repository.

Groups

Morning Section

  • Nolan S., Hongyuan Z., and Josh C.
  • Derek W. and Jinlin H.
  • Ben G. and Cameron C.
  • David N. and Ben W.
  • Tapiwa Z. and Henry F.
  • Jasper E. and Joshua T.
  • Sam S. and Linh P.
  • Maisy D. and Myles B.
  • Matt M. and Abyaya L.
  • Greyson B. and Sooji S.
  • Seth R. and Maddie G.
  • Shida J. and Jonathan S.
  • Cara B. and Anaan R.
  • Logan G.

Afternoon Section

  • Will J. and Alissa J.
  • Zachary S. and Rachel N.
  • Mujtaba A. and Caelin B.
  • Zoe G. and Mike Z.
  • Linh B. and Enrique R.
  • Eli M. and Jacob E.
  • Ella N. and Jonathan G.
  • Aleksandar H. and Zander O.
  • Hadley L. and Jong Hoon B.
  • Kamal N. and Shuyi Q.
  • Garrett W. and Dennis C.
  • Philip K. and Pouya M.
  • Eli S. and Mackenzie M.
  • Elizabeth Z. and Logan G.

Questions & Answers

Do we need to grow our hash table for part C?
No. If you use a hash table, I recommend chaining rather than linear probing, otherwise you will have a hard upper limit on capacity.
How do we test whether some accesses can run concurrently?
You probably can’t do this easily. Look over your code and convince yourself that it works.
How do we make threads in our test files?
The TEST(Something, SomethingElse) { line starts a function that Google Test will run. You can call pthread_create from this function just like you would from main.

Part A: Stack

As you likely remember from CSC 161, a stack is a simple data structure that supports two operations: adding items to the top, and removing items from the top. Implement a thread-safe stack with the following interface:

my_stack_t
This type holds all of the information you need to access a stack. It should probably be defined as a struct. As a user of your stack implementation, I should not need to know what is inside this struct.

void stack_init(my_stack_t* stack)
This function takes a pointer to memory that can hold a my_stack_t and initializes it to an empty struct.

void stack_destroy(my_stack_t* stack)
This function takes a my_stack_t and destroys it. This should free any memory allocated by your code with pointers inside the stack, but not the stack parameter itself, since this memory was not allocated by your code. The program using your data structure should not use the stack passed to this function without re-initializing it.

void stack_push(my_stack_t* stack, int element)
Push an integer onto the stack. The provided my_stack_t* must have been initialized with stack_init, and must not have been passed to stack_destroy.

bool stack_empty(my_stack_t* stack)
Check if a stack is empty.

int stack_pop(my_stack_t* stack)
Pop an integer from the stack. The provided my_stack_t* must have been initialized with stack_init, must not have been passed to stack_destroy, and must not be empty. If the stack is empty, stack_pop should return -1. If the stack is not empty, the integer on the top of the stack should be removed from the stack and returned.

Requirements

You are free to choose any implementation method you like for your stack, but you may not assume an upper limit on the number of elements that will be in the stack. That means a static-size array will not work.

You must also make your stack reentrant; it must be possible to have two or more independent stacks that can be pushed and popped independently. This means there should be no global variables in your stack implementation.

Multiple threads may access the same stack at the same time, so you will need to use synchronization inside your stack implementation to maintain the integrity of this data structure. When we think about data structures, it is often helpful to write down invariants—statements that must always be true about the data structure. Your synchronization for the stack data structure must maintain the following invariants:

Invariant 1
For every value that has been pushed onto the stack times and returned by pop times, there must be p-q copies of this value on the stack. This only holds if .
Invariant 2
No value should ever be returned by pop if it was not first passed to push by some thread.
Invariant 3
If a thread pushes value A and then pushes value B, and no other thread pushes these specific values, A must not be popped from the stack before popping B.

Invariant 1 describes the basic requirements for most reasonable data structures: items that are added to the data structure but not removed should still be contained in the data structure. There cannot be a negative number of copies of a value on the stack, so this also implies that a value should never be returned from pop more times than it has been pushed.

Invariant 2 just says the stack is not allowed to invent new values that were never added to the stack. This is another reasonable requirement for most data structures.

Invariant 3 is specific to stacks; it restricts the order in which pop may return values, based on the order these values are pushed. If two values are pushed from one thread they should be popped in the reverse order.

This invariant says nothing about the order of pushes and pops carried out by threads running in parallel. If two threads are pushing onto the stack simultaneously, both values should end up on the stack in some order. If one thread pushes a value while the other thread pops a value, the popping thread could receive the old or new top value from the stack.

Testing

You will use the Google Test framework to test your data structure implementations in this lab. The file partA/tests.cc shows a simple example test. You will need to add tests for each invariant to this file. To run these tests, run your partA executable or type make test to run these tests and report the results.

The provided test code uses ASSERT_EQ to require that two values are equal, and ASSERT_NE to require that two values are not equal. When comparing strings, make sure to use ASSERT_STREQ and ASSERT_STRNE. You can find simple test guidance on the Google Test Primer, and a complete list of comparisons on the Advanced Guide.

Testing Invariants 1 and 2

To test these invariants, create at least two threads that push and pop random values. These threads should keep track of how many times they pushed each value (perhaps a random integer between 0 and 100) as well as the number of times they popped each value. When these threads finish executing some number of pushes and pops, return the counts of pushed and popped values to the main thread. The main thread can then pop the remaining values from the stack and make sure all the totals for pushed values match those of the popped values. At this point you should make sure all the values popped were also pushed, which checks invariant 2.

Testing Invariant 3

This invariant has to do with pushing and popping order. It may be difficult to validate the order of values popped from a stack if you allow unconstrained values to be pushed, but you can be clever about how you choose values to push. Instead of pushing random values, you could create a thread that pushes even numbers in ascending order, and another thread that pushes odd values in ascending order. When you pop these values (most likely from the main thread) you can simply keep track of the last even and odd values you popped from the stack; if you ever see pop return a larger even or odd value than the last one you saw, the invariant has been violated.

To test this invariant, you should create two test cases: one where pushes happen in parallel and pops happen in the main thread, and a second that pushes in the main thread and pops in parallel. This ensures that both pushes and pops can proceed in parallel without unintended reorderings in the stack.

Testing Strategies

I have not included the testing strategies for parts B and C of the lab. You will have to come up with these on your own, although they may be very similar to the tests for part A. In general, constraining the sequence of inputs to a data structure to make it easy to validate outputs is a good approach that you should use whenever possible.

Grading

Your grade for this part of the lab will depend on several factors:

  1. Do your tests adequately check that the data structure invariants are preserved?
  2. Is your data structure properly synchronized?
  3. Is your code concise, clean, and well-documented?
  4. Does your code build without warnings?
  5. Does your code consistently check for error conditions in standard library calls?
  6. Is your program free of memory leaks?

Part B: Queue

Like a stack, a queue has a very simple interface. However, a queue returns elements in the order they were added to the queue, not the reverse. These data structures are often described with the acronyms LIFO (last in, first out) and FIFO (first in, first out) for stacks and queues, respectively. Your queue should implement the following interface:

my_queue_t
This type holds all of the information you need to access a queue. It should probably be defined as a struct. As a user of your queue implementation, I should not need to know what is inside this struct.

void queue_init(my_queue_t* queue)
This function should initialize an empty queue in the space passed in with the queue parameter.

void queue_destroy(my_queue_t* queue)
This function takes a my_queue_t* and destroys it. This should free any allocated memory associated with the given queue. It is not safe to do anything with the my_queue_t* after it has been passed to this function.

void queue_put(my_queue_t* queue, int element)
This function takes a queue that has been initialized with queue_create (and has not been passed to queue_destroy) and an integer to add to the queue. It then adds element to the queue.

bool queue_empty(my_queue_t* queue)
This function returns true if the queue contains no items.

int queue_take(my_queue_t* queue)
This function takes a queue, removes the first element, and returns this element. If the queue is empty, the function should return -1.

Requirements

You are free to choose any implementation method you would like for your queue, provided you do not assume any upper bound on the number of elements in the queue. As with your stack implementation, the queue must be reentrant: independent queue can exist simultaneously, so no globals please.

As with your stack, the queue may be accessed simultaneously by multiple threads. Add synchronization to your queue to maintain the data structure’s integrity. You may want to refer to the reading in locked data structures for a clever implementation technique that allows queue_put and queue_take to run concurrently.

Write invariants for the queue data structure and enforce these invariants using synchronization. You should include your queue invariants in the README.md file in your project. The invariants for the stack data structure may be helpful as you develop your queue invariants.

Testing

Use the Google Test framework to test your queue implementation. Make sure your tests adequately check all of the invariants you specify. Remember that thread interactions are difficult to control, so an improperly-synchronized data structure may pass all of its tests. If you make your tests run longer, with enough concurrent accesses, you are more likely to catch an error.

The included test cases check the basic functionality of your queue. Please leave these in place; add additional test cases to check your invariants.

Grading

Your grade for this part of the lab will depend on several factors:

  1. Are your data structure invariants accurate and reasonably complete?
  2. Do your tests adequately check that these invariants are preserved?
  3. Is your data structure properly synchronized?
  4. Is your code concise, clean, and well-documented?
  5. Does your code build without warnings?
  6. Does your code consistently check for error conditions in standard library calls?
  7. Is your program free of memory leaks?

Part C: Associative Array

One particularly useful abstract datatype is an associative array, also known as a dictionary in python and map in C++ and Java. An associative array works like an array with two major differences: the index into an associative array is not necessarily an integer (we will use strings for this lab), and the “array” does not have a fixed size. We will implement an associative array that stores integers, and uses strings as indices into the associative array.

To access an associative array, you may get the value with a given key, set a key to a particular value, or remove the entry for a given key.

To keep function names short, we will use the name “dict” for this datatype in the interface below. Your associative array should have the following interface:

my_dict_t
This type holds all of the information required to access a dictionary. It should probably be a struct, but as a user of your data structure implementation I should not need to know anything about this type.

void dict_init(my_dict_t* dict)
This function initializes a dictionary in the memory pointed to by dict.

void dict_destroy(my_dict_t* dict)
This function takes a my_dict_t* and destroys it. This should free any allocated memory associated with the dictionary. It is not safe to do anything with the my_dict_t* after it has been passed to this function.

void dict_set(my_dict_t* dict, const char* key, int value)
This function adds or updates an entry in dict. If there is already an entry for key, then its value will be updated. If no entry exists for key, a new one is added.

bool dict_contains(my_dict_t* dict, const char* key)
This function should return true if the dictionary contains a value with the given key.

int dict_get(my_dict_t* dict, const char* key)
This function looks up a key in dict. If there is an entry for key, the last value associated with this key is returned. If no matching entry exists, the function returns -1.

void dict_remove(my_dist_t* dict, const char* key)
This function removes any entry associated with key. If there is no entry for key, this function does nothing. After removing a key from the dictionary, dict_contains should return false for that key, even if it was set multiple times.

Requirements

You are free to choose any implementation method for your associative array, with some reservations. You may not assume any upper bound on the number of elements that will be in a dictionary, your implementation must be re-entrant, and you may not use a linear scan to implement the entire dictionary. Hash tables and binary search trees are both reasonable implementation strategies, but scanning through a linked list or array is not. Please describe the implementation approach you are using for your dictionary in README.md, including both the underlying data structure (hash table, binary tree, etc.) and the synchronization strategy you are using.

Multiple threads may access the same dictionary concurrently, and you are required to allow concurrent accesses whenever possible; two calls to get, for example, should not block each other. The exact details of what kinds of accesses can happen in parallel will depend on your implementation. Document the types of accesses that may will block each other in your README.md, along with the data structure invariants that your locking scheme preserves.

Testing

Use Google Test to check the invariants you specified for your dictionary data structure. Remember that thread interactions are difficult to control, so an improperly-synchronized data structure may pass all of its tests. If you make your tests run longer, with enough concurrent accesses, you are more likely to catch an error.

The included test cases check the basic functionality of your dictionary. Please leave these in place; add additional test cases to check your invariants.

You do not need to test whether certain accesses will proceed in parallel or not. This is especially difficult to check with automated tests, so you will likely need to resort to careful code reading to verify that your synchronization scheme is working.

Grading

Your grade for this part of the lab will depend on several factors:

  1. Are your data structure invariants accurate and reasonably complete?
  2. Do your tests adequately check that these invariants are preserved?
  3. Is your description of allowed concurrent accesses to the data structure accurate?
  4. Is your data structure properly synchronized?
  5. Is your code concise, clean, and well-documented?
  6. Does your code build without warnings?
  7. Does your code consistently check for error conditions in standard library calls?
  8. Is your program free of memory leaks?
  9. Does your synchronization scheme allow concurrent reads without blocking?
  10. Does your synchronization scheme allow concurrent writes to different keys without blocking?