Lab: Memory Allocator

Assigned:
Tuesday, Feb 13, 2018
Due:
Friday, Feb 23, 2018 by 10:30pm
Collaboration:
Work with your assigned partner for this lab. You can use online resources for your lab, provided they do not provide complete answers and you cite them in your code. If you do not know whether it is acceptable use a specific resource you can always ask me.

Overview

For this lab, you will write a simple implementation of the malloc and free functions. This simple memory allocation library will give you a much deeper understanding of pointers and memory management. Building an allocator can be a bit painful: odds are your favorite C functions call a memory allocator, which means you can’t use them inside your allocator or you’ll end up with never-ending recursion.

Please read the following instructions carefully. There are quite a few important details in the implementation of this lab, and getting them wrong can lead to some very difficult-to-diagnose errors.

Groups

Group information is no longer available for this course.

Questions & Answers

Should the block header be the same size, regardless of the size of the objects in that block?
Yes. Having the same header is helpful when you implement xxmalloc_usable_size. Given a pointer to some allocated object, you can round that pointer down to the next lowest multiple of 4096 to find the start of the block. Then, you can treat that address as a block header to find the object size. That only works if all blocks have the same object header structure.
Should we reorder blocks when the program frees something from a block that is not first in the block list?
No. The grader may check this incorrectly, but I will fix it.
When implementing large object support, how do we handle the case where the magic number appears by chance?
Give up.
Are we supposed to catch errors where our block header is overwritten?
No.
We are getting an error that MAP_ANON is undefined.
I think you should use MAP_ANONYMOUS. Make sure you include sys/mman.h.
How do we actually place a block header at the start of a page?
The mmap function returns a pointer to the start of the page. If your block header struct is called block_header_t, you can cast this pointer to a block_header_t* and write to it.
What is the block list array for?
This array holds the head pointers for a set of linked lists, each containing all of the blocks we’ve allocated, grouped into lists based on the size of objects in the block.
What should xxmalloc_usable_size do?
It should return the number of bytes that were actually allocated for an object (passed in as a parameter).
Do we have to zero out memory before we return it from malloc?
No, although it’s not a terrible idea for real allocators to do this.
Can we just give 2048-byte objects a whole page, like we do for larger objects?
No. Adding the block header and freelist makes it possible to free and reuse these objects, unlike large objects.
Does xxfree have to work if the pointer is at the very end of the object?
No, I hope not.

Getting Started

The starter code at this lab is available on GitHub at https://github.com/csc213/malloc. The starter code includes a simple, somewhat-functional heap implementation in allocator.c, along with a wrapper that allows you to replace the default allocator with your custom allocator when you run a program.

The “somewhat-functional” heap uses the mmap system call to request memory from the operating system every time malloc is called. This is a terrible allocator! The mmap system call returns pages of memory to the program, meaning we’re requesting at least 4096 bytes of memory for every allocation, even if we only allocate an int. In addition to wasting an enormous amount of memory, free is not implemented so we never actually reclaim this space.

While the starter code implements a very bad allocator, it does work for some programs. Most notably, it does not work for programs that call malloc often enough that the operating system runs out of space. You can run some simple utities like ls or editors like vim if you don’t open a large file.

To run a program with your allocator instead of the default allocator, you will need to use the LD_PRELOAD environment variable. This is a special environment variable that ld, the dynamic linker (linker dynamic?), looks at before starting a program. Any libraries listed in this environment variable will be loaded before system libraries. That means any functions you implement in a preloaded library will replace the default implementation. There are some tricks to doing this correctly with a memory allocator (for one, the dynamic linker allocates memory, which creates some odd circular dependencies) so I have included a wrapper from a system called heap layers that takes care of most of this for you. You just need to implement functions called xxmalloc, xxfree, and xxmalloc_usable_size in allocator.c. Leave the files wrapper.h and gnuwrapper.cpp alone!

To run a program with your custom allocator, use a command line like the following (omit the dollar sign. This is how I show the shell prompt):

$ LD_PRELOAD=./myallocator.so ls

This assumes you’re running the command from the root of your project directory. Everything should run as it usually does. To make sure your allocator is actually running, you should break something. Go to the xxmalloc function inside allocator.c and add a line return (void*)-1; to the beginning of the function. After running make, test your allocator again. This time, the ls command should fail, most likely with a segmentation fault. Remove this intentional error now so it doesn’t cause problems later!

You can also run more complex programs like vim:

$ LD_PRELOAD=./myallocator.so vim

The program will probably start up okay, but if you open a large source file you’ll probably run out of memory and vim will crash. We’ll fix that later.

Speaking of crashes, you may find it useful to run programs with your allocator inside a debugger. To do this, you have to start things a little differently. This session starts the program ls with your custom allocator inside of gdb.

$ gdb ls
...
Lots of gdb startup messages go here
...
> set environment LD_PRELOAD=./myallocator.so
> run

If you want to start the program with command line options, they go immediately after the run command (on the same line).

Implementation Overview

Starting with our Very Bad Allocator™, you will implement what is often called a BiBoP allocator. BiBoP, short for “Big Bag of Pages”, is an old and well-known technique for allocating memory that uses a size-segregated heap. The job of a memory allocator is to request large blocks of memory from the OS, then parcel out smaller pieces of that memory to programs that request it. You could imagine an allocator that cleverly splits large blocks into smaller pieces, called allocated objects, when needed. A clever allocator could then coalesce adjacent freed objects into larger free objects so they can be used to satisfy larger requests in the future. While this approach has its place, it’s often not worth the effort. A size-segregated heap designates each large block of memory to hold objects of a single size. If you need to allocate objects of a different size you have to get them from a different block. This simplifies the allocator, and actually has some nice performance advantages as well. For one, we can store the size of allocated objects once per block instead of next to every object, so there is less wasted space for metadata.

It’s easiest to describe the structure of a BiBoP allocator by explaining how it handles requests.

Allocating Memory

Our allocator enters the picture when the main program calls malloc(27). The code in gnuwrapper.cpp handles some boilerplate, and it then calls xxmalloc(27). We’re supposed to find a 27 byte space in memory and return it to the program. Because a BiBoP allocator dedicates an entire block to one size of object, we don’t want to keep blocks around for every odd object size; instead, our allocator rounds up to the next power of two. Some systems also require a minimum alignment for allocated memory of 16 bytes, so you should round any smaller requests up to 16.

Now we’re looking for 32 bytes of free memory. A BiBoP allocator keeps a separate list of blocks (those are the large chunks of memory from the OS) for each object size. We’ll have eight lists of blocks, one to hold blocks dedicated to 16 byte objects, one for 32 byte objects, one for 64 byte objects, and so on up to 2048 bytes. We’ll handle larger objects by calling mmap directly.

The beginning of each list of blocks is reachable through a pointer to the beginning of the block. Keeping an array of these pointers makes it easy to select the appropriate list; take the log2 of the object size, then subtract log2 of the minimum object size. This gives you the index of the appropriate block list. You can compute the log base two of a power of two integer using the __builtin_ctz function described on the GCC Builtins page. In fact, you can write all of the rounding and size manipulation rountines for this lab without loops or large numbers of if or switch/case statements. To receive full credit, you should use reasonable, concise implementations when dealing with array indices or rounding.

Make sure you test your rounding code extensively. This has historically been the most common source of errors later in the lab.

In our case, we take log2(32), which is 5, and subtract the log of our smallest object, log2(16), which is 4. Looking in the array of block lists, we check index 1. If this pointer isn’t NULL it gives us the address of the first block we can use to hold 32-byte objects.

If the size 32 block list is not NULL

At the beginnng of the first block in the block list, we have a block header. This structure should hold the size of the objects on this block, the next pointer that takes us to the next block, and the freelist pointer. The freelist pointer points to the first free object in this block, which is guaranteed to be 32 bytes because of how the heap is structured. We’re going to allocate this free block of memory, so we should save it to a temporary, look inside the object to find a pointer to the next free object, then use this next free object as the head of the free list. What is happening here is that we’ve used the free space to store a linked list to record all the free space our allocator currently has available. Yes, this is a little mind bending; I will draw some pictures of this on the board.

If the freelist was NULL, this block does not have any free objects so we should move on to the next one. If none of the available blocks have free objects, proceed on to the next step.

If the size 32 block list is NULL

If there are no size 32 blocks, we need to make one by asking the OS for memory. To do this, call mmap as in the provided starter code to allocate PAGE_SIZE bytes. Once you have a page of memory, you should put a block header at the beginning of this available memory. Break the remaining memory on the page into 32 byte chunks (or whichever size is appropriate for the block list you are currently looking in). For simplicity and correctness, you should divide the remaining space into 32 byte aligned chunks; in general, you should align these chunks to the block’s object size. If you don’t do this, your allocator may not work and freeing memory will be very difficult. Add each chunk (a free object) to the block’s freelist. You should use the available space inside each free object to store a linked list node.

Freeing Memory

At some point later, the program calls free. The code in gnuwrapper.cpp does some boilerplate again and calls along to xxfree. All we have at this point is a pointer to somewhere inside the object. We don’t know the size of the object, and we don’t know if the pointer is at the beginning of the object or somewhere in the middle. The first step is to find the size of the object.

Finding an object’s size

You will need to know an object’s size to free it, but you’ll also have to implement xxmalloc_usable_size for the allocator to work properly. You can call xxmalloc_usable_size from inside xxfree, so you might as well start out implementing this helper function. Block headers go at the beginnings of pages, and every page starts at a multiple of PAGE_SIZE. Because of this, you can round a pointer down to the next multiple of PAGE_SIZE to find its header. Once you have the header you can simply read out the object size from the block header.

Finding the beginning of an object

Now that we know the size of the object being freed, we need to figure out where it starts (the program may be freeing from an interior pointer). Because you carefully pareceled out free objects aligned to the object size (you did do that, right?) this just requires rounding down to the next multiple of the object size.

Finally, you have pointers the block header and the beginning of the object being freed, so you can just add it to the block’s free list. It’s usually best to reuse freed memory as soon as possible, so add the freed object to the front of the free list.

Part A: Working with Sizes

Before writing any code in your memory allocator, you should implement the computations to round object sizes to the next power of two and select the appropriate block list. This code is difficult to test inside of your allocator because you cannot call functions like printf when you are inside of xxmalloc. The only POSIX functions you are allowed to call are the async signal safe functions listed on this page. You can see that I have added a slight hack to print error messages (sometimes) when things go wrong. If you set use_emergency_block to true you can make exactly one call to a function like printf that may call malloc to display an error, but you should plan to quit the program after doing this because the heap will almost certainly be corrupted when printf frees this memory. It’s not a great way to debug your program, but sometimes our options are limited when we’re messing with the innards of the runtime or operating system.

I recommend implementing your rounding code in a separate test program to make sure it works, then move it into your allocator once you trust it.

Part B: Allocating from Blocks

Implement xxmalloc for objects up to 2048 bytes. You will need the code from part A to choose the appropriate block list, along with new code to allocate new blocks, initialize the new block’s freelist, and to take a free object off the freelist and return it to the program.

We don’t have any code to compute the size of an allocated object yet, but you can assume that any allocated memory is at least 16 bytes. Update xxmalloc_usable_size so it always returns 16. The gnuwrapper.cpp file defines realloc, which calls xxmalloc_usable_size to check if an object can be resized. Once you’ve updated xxmalloc_usable_size to return the conservative lower bound of 16 you should still be able to run some simple programs, even though freeing does not work yet.

If you test a program and it crashes, check to see if xxmalloc is being asked to allocate more than 2048 bytes at a time (try a gdb breakpoint or use the emergency error message hack included in xxmalloc). If so, you may have to skip ahead to part D before you can test your allocator; many large object allocations happen as part of program startup and can be difficult to avoid.

Part C: Freeing Memory

Implement the size calculation and freeing process described above. You should have a complete implementation of xxmalloc_usable_size after this step. You may want to implement and test your rounding down procedure in a separate program where you can print test results, instead of testing it live in your allocator.

Part D: Large Objects

We don’t want to deal with large objects using our BiBoP structure, so we’ll just fall back to mmap. For any object larger than 2048 bytes, just round the mmap request up to a multiple of PAGE_SIZE and return the result to the program.

One difficult issue with large objects is in the free implementation. Pages that hold large objects do not have a block header, but we may go looking for one. A standard technique to determine whether a page has a block header is to add a magic number to the beginning of each BiBoP block. Add a field to your block header struct and fill it in with a recognizable constant (I particularly like 0xD00FCA75 or 0xF00DFACE). In your size computation, check for this constant in the block header before returning a value. If it isn’t there, you know you are looking at a large object instead of a BiBoP block.

If a large object is passed to xxfree you can simply disregard it.

Extra Credit Opportunity

If you would like to earn some extra credit, you can implement a special large object header that stores the size of the allocated object. You may need to go looking backwards for this header because large objects could span multiple pages. To mark the beginning of a large object, use a different magic number. Once you’ve found this point, you can free the large object back to the OS by calling munmap.

There are other, better strategies for supporting large objects, like reserving a dedicated large object area. If you want to attempt one of these alternative strategies I can help you plan a reasonable implementation.

Testing your Allocator

Your final submitted allocator should work on simple command line utilities, as well as with gcc, vim, top, and the lynx command line web browser. In addition to testing your allocator wiht some simple programs, you can use the same grading script I will use to assign a portion of your grade on this lab. You can find this code at https://github.com/csc213/malloc-grader. To use the grader, run make in the grader directory, then run the grader program with LD_PRELOAD set to use your allocator. The grader enforces some properties that are specific to this allocator design, so even the standard libc malloc will not receive full credit on this lab. The requirements for your allocator are:

  1. All allocated memory must be writable and readable
  2. Allocated memory up to 2048 bytes should be rounded up to a power of two size, with a minimum of 16 bytes. Allocations over 2048 bytes should be rounded up to the next multiple of 4096 bytes.
  3. Allocated objects up to 2048 bytes must be aligned to their power of two size. For example, malloc(24) should return a pointer to 32 bytes of memory that is aligned at an address that is a multiple of 32.
  4. Allocated memory should not overlap other allocated memory.
  5. Freed memory must be reused.
  6. Your allocator should return all the objects on a block before moving to a new one.
  7. Your allocator should return sufficient memory for large objects.

If you find that you are failing some of these tests, make sure your implementation of xxmalloc_usable_size is working correctly. This function is an important part of the testing process, so an incorrect implementation can cause many other tests to fail.

As usual, once your lab is complete you should submit your work with a pull request.