Lab 13: Build a Cache

In this lab, you will implement a direct-mapped cache for read-only memory in Logisim. While this lab is required, it includes an open-ended component at the end where you can earn some extra credit for making improvements to the cache. You must make at least one improvement, but you can add additional functionality to the cache for up to an assignment worth of extra credit (three additional improvements).

Assigned: Tuesday, November 29

Due: Friday, December 9 by 5:00pm

Collaboration: There are no assigned groups for this lab. You may work individually or in pairs.

Submitting your work: Submit your work by email. You should submit answers to part A, your Logisim file, and a description of the problem or problems you fixed for part E.

Part A

The memory system you are implementing will use eight bit addresses and a four-entry cache with four byte cache lines. Before starting your implementation, answer the following questions:

  1. How many bits will be used for the byte offset in this cache?
  2. How many bits will be used for the cache index?
  3. How many bits will be used for the cache tag?
  4. How do you know when there is a cache miss? Don’t forget about the “valid” bit in each cache entry, which will start at zero for an empty cache.

Part B

Start Logisim with the following command:

$ java -jar /home/curtsinger/bin/logisim.jar &

Create a new Logisim file named cache.circ. You will build the main cache in the main subcircuit, but first we need to build a single cache entry. Create a subcircuit called cache-entry and add the following input pins:

tag
The tag for the requested address
byte_offset
The byte offset portion of the requested address
clock
The clock signal for the cache entry (1 bit). Note: Do not use a clock element here; you should use an input pin so all the copies of your cache entry subcircuit can be connected to the same clock.
data_in
The new data that should be cached for the current tag (32 bits)

You should also set up the following output pins:

data_out (8 bits)
The cached data at the specified byte offset
miss (1 bit)
If set, the cache entry missed and a value should be loaded from memory

Once you have set up your subcircuit inputs and outputs, you will need to add six registers to your entry. Four eight bit data registers to hold the four bytes held in this cache entry, a one bit valid register that indicates when a cache holds valid data, and a tag register to store the tag for the currently cached data.

Part C

Use the byte_offset input and a multiplexor to select the appropriate cached byte in your cache entry subcircuit.

Use a comparator element (available under Logisim’s “Arithmetic” category) to check whether the tag input pin matches the tag stored in your tag register. Use the result of this comparison and the value stored in the valid register to determine whether or not there was a cache miss.

When there is a cache miss, the main circuit will load the requested data from memory and pass this to the data_in pin. Use a splitter (available in Logisim’s “Wiring” category) to break the data_in pin into four separate bytes to be stored in each of the four data registers.

Set up your cache entry to store the value passed in on data_in when there is a cache miss. Once you have tested your cache entry circuit, raise your hand and I will double-check it with you.

Part D

Use your completed cache entry subcircuit to build a four-entry cache in the main subcircuit. You will need the following elements:

Four cache-entry subcircuits
The actual cache entries
An address input pin
The address being loaded from cache/memory (8 bits)
A data_out output pin
The data read from cache/memory (8 bits)
A ROM element
The actual memory structure, available under Logisim’s “Memory” category.
A clock
The clock will trigger reads from ROM and updates to the cache entries

Set up your ROM element to use 32 bit data width; it will return all four bytes for a cache entry in one read. Because you will load four bytes at a time, the ROM element will only need six bit addresses. These will be the six most-significant bits of the requested address; all four of the bytes selected with the byte offset bits will be returned in the 32 bit result.

  1. Use a splitter to break your address input into the tag, index, and byte offset bits.
  2. Pass the tag and byte_offset values to all four cache entries.
  3. Use a multiplexer to select the data output from the appropriate cache entry. How do you know which cache entry to use?
  4. Use a decoder and and gates to control which cache entry gets a clock signal. Tou should only pass a clock signal to one cache entry at a time; if the cache misses, the clock signal will cause the cache entry to store the new data passed to data_in.
  5. Connect the ROM element to the clock and its output to the data_in pins on each cache entry.
  6. Use a splitter to control the read address of the ROM element. Don’t forget to take off the two lowest bits.

Part E

There are a few problems with this cache implementation. Fix at least one of these problems. If you fix more than one, you may earn extra credit.

  1. The circuit reads data from ROM on every clock cycle, even if there is a cache hit. Modify the circuit to use the miss output from the appropriate cache entry to determine whether ROM is accessed. Hint: the sel input will be useful for controlling whether or not memory is accessed.
  2. It is hard to tell when this cache is working well. Use Logisim’s counter components to count total cache accesses and hits/misses so you can compute the hit rate.
  3. This cache is direct-mapped, so it has a lot of conflicts. Change your cache entry to a two-way set-associative cache. You do not need to implement an LRU replacement policy; you can instead use the Logisim Random Generator element (in the “Memory” category) to pick an entry to evict at random.
  4. This cache does not support writing to memory. Add write_data and write_enable pins to the main circuit. You will need to change your ROM element to a RAM element. You should implement a write-through policy; every write goes to both the cache and RAM. Other policies are more complex.
  5. Memory accesses take the same amount of time as register accesses in this cache implementation, which isn’t terribly realistic. Modify your implementation so data is not available until ten cycles after a cache miss. You will need to add an output pin to the cache that indicates when a value is ready. Hint: You can use chained registers or comparators and counters to insert these delays.

If there is a different problem with this cache that you would like to fix instead, talk to me or send an email describing the problem and your planned fix.