Keeping Track of Tasks with Restricted
Access Collections
Summary:
We consider a few related data structures that are frequently used to
collect values and provide value-by-value access to the collection.
Together, these data structures are sometimes called
restricted
access collections. Individually, they are called queues, stacks,
and priority queues. We also consider a common application of these
data structures: keep track of tasks left to do.
Introduction: Deep Recursion, Revisited
As you may recall, one of the topics we've been considering recently
deep recursion, a type of recursion in which
each procedure call may make multiple recursive calls to itself.
(In contrast, traditional linear recursion, each procedure call makes
at most one direct recursive call.) Deep recursion is particularly useful
for processing trees, since we can recurse on both the left and right
subtrees. For example, we might compute the size of a tree with the
following procedure:
In this case, there are either zero or two recursive calls per node.
We've also used deep recursion to generate fractals, as in
the fractal-rectangle procedure. A simplified
version of that procedure follows.
In this case, there are either zero or four recursive
calls per procedure call.
One of the issues associated with deep-recursive procedures, like
fractal-rectangle, is that there are lot of
hidden delayed procedure calls built up somewhere.
For example, while we're in the middle of drawing the upper-left
corner of a rectangle, we have some recursive calls built up for the
rest of that smaller rectangle and some recursive calls built up for
the rest of the rest of the bigger rectangle (and, if we have multiple
levels of fractal recursion, even more recursive calls built up for
intermediate rectangles). However, programmers do not have access to
these recursive calls - once they've been set up, they will happen at
some time determined by the computer (and by the structure of the
program).
Because the programmer does not have access to those delayed calls, it
is not possible to modify them (say, to stop drawing midway through or
to de-prioritize some activities) or even to check what is happening.
As an alternate, some programmers make the remaining things to
do
an explicit parameter to the procedure, rather than implicit
from the call structure of the procedure.
In effect, the programmer needs a data structure to keep track
of the tasks left do. Instead of making lots of recursive calls, a
procedure will simply add tasks to the collection. When done with one
call, the program will grab the next task and continue. We're done
when there are no tasks left. Collections that provide the operations
of add an element
, get the next element
,
and remove that element
are called restricted
access collections.
Restricted access
means that the collection allows access
to only one element at any given time.
Interestingly, the policy by which we choose that one element
(either to use or remove) can have a significant effect
on the behavior of the procedure. We'll explore that behavior a bit
in this reading and the corresponding lab.
Primary Procedures
In many ways, these restricted access collections,
used to collect tasks, are like
the to-do lists that many of us maintain: We add tasks, we remove
tasks as we do them, we check if we have tasks left. The big difference
is that we'll be a little more structured in the access we provide
to tasks. Let's first describe what we want the procedures to do
and then come back to the implementation a bit later.
One of our first decisions in thinking about collections of tasks
is whether they should be mutable (as is the case with vectors)
or whether we build new collections from old, by adding or removing
elements (as is the case with lists). The common custom in Scheme
programming is to avoid mutation as much as possible, so we will make
collections non-mutable.
Of course, that decision has some implications for how we remove
elements. On paper, we might simultaneously find and remove the
next task. In code, it's useful to separate those activities. Why?
Well, consider what happens if we grab and remove a task. In effect,
we'll need that procedure to return two values: the next task and
the modified collection of tasks. Although it is possible to have a
procedure return two values, it is simpler to separate the activities.
Given those decision, there are five basic operations we can use
for manipulating collections of tasks: create
a new collection of tasks, add an element to
the collection, get the next element to process,
remove that element, and check
whether or not the collection contains any more tasks to process.
Since these collections have arbitrary size, the procedure to create
new collections needs no parameters.
To get the next element from the collection, we need only the collection.
Note that different collections may have different policies for getting
a value, so we do not specify the policy in this documentation.
Why might different collections have different policies? Well,
consider a typical task list. Some people do tasks in the same order
in which they are added. Others associate priorities with their tasks.
Others choose tasks randomly. Still others have more complex policies.
We'll revisit those policies in a subsequent section. For now, we
simply leave the particular decision of what element to get unspecified.
We'll leave the same issue unspecified in
collection.drop.
Since some people prioritize their tasks, we'll include a priority in
the collection.add procedure. In many cases, we'll
ignore that priority. However, it's easiest to include it
for the few cases in which we need it.
Finally, we'll need a way to make sure that there are tasks left
to process.
Using Restricted Access Collections
Let's consider how we could use these techniques to make the collected
recursive calls in simple-fractal-rectangle!
explicit, rather than implicit. We'll explicitly keep track of a
collection of the sub-rectangles left to draw. We can represent each
sub-rectangle left to draw. draw as a six-element vector (color,
left, top, right, bottom, level).
Now, what do we do with that collection? We repeatedly grab the first
rectangle and process it as appropriate (sometimes adding more rectangles
to draw). For clarity, we'll use separate processes to do the repetition
(kernel) and the processing of individual
rectangles (process-rect). The kernel is
relatively straightforward.
(letrec ((kernel
(lambda (tasks)
(if (collection.empty? tasks)
'Success
(kernel (process-rect (collection.get tasks)
(collection.drop tasks)))))))
...)
What should process-rect look like? For the
base case (level is 0), we just draw the rectangle. For the
recursive case, we need to build four tasks (well, descriptions of
rectangles) and add each to the collection.
(let ((midcol (round (* 0.5 (+ right left))))
(midrow (round (* 0.5 (+ bottom top))))
(nextlevel (- level 1)))
(let* ((rect1 (vector (rgb.lighter color)
left top midcol midrow (- level 1)))
(rect2 (vector (rgb.darker color)
midcol top right midrow (- level 1)))
(rect3 (vector (rgb.darker color)
left midrow midcol bottom (- level 1)))
(rect4 (vector color
midcol midrow right bottom (- level 1)))
(c1 (collection.add remaining-tasks rect1 0))
(c2 (collection.add c1 rect2 0))
(c3 (collection.add c2 rect3 0))
(c4 (collection.add c3 rect4 0)))
All that's left is to decide what rectangle the collection should start with.
That's easy enough: We start with a collection that contains the
whole rectangle. Putting it all together, we get
You'll explore this version of the procedure in the lab.
Policies for Getting Values
As we mentioned earlier, there are many different policies that can be used
for selecting the next value to process. One of the most common policies
is first-in, first-out (FIFO). In this policy, the
value returned by collection.get is the first of the
remaining values. That is, if we add A to the collection before B, we
process A before B. Restricted access collections that implement a FIFO policy
are typically called queues.
Another common strategy is to assign a numeric priority to each value,
and to remove values according to their priority. (Lower numbers
represent higher priorities.) Restricted access collections that implement a
priority policy are typically called priority queues.
You'll note that neither of these is the strategy that Scheme seems to use
when processing procedures. That is, if you've set up six recursive calls,
and the first of those recursive calls does four more recursive calls, those
newer calls are done before the other calls, rather
than after (as in FIFO). There are also strategies that humans use that
don't match the other two. For example, some people always put the
mail they've just received on the top of their inbox, and process their
inbox from top to bottom. In both cases, the strategy seems to be much
more one of last in, first out (LIFO). Restricted
access collections that implement a LIFO policy are typically called
stacks.
Queues, priority queues, and stacks are the three most common
restricted access collections. We implement each in different ways.
Implementing Stacks
We'll consider only stacks, and only one implementation of stacks.
When implementing these kinds of collections, we might use lists,
trees, or vectors. Since we're building non-mutable structures, and
accessing the values in those structures one at a time, it seems that
lists are the easiest possible underlying structure.
A new collection is simply the empty list.
To add something to the collection, we cons it onto the front (ignoring
the priority).
The next thing to process is therefore the car.
Similarly, to delete something, we just take the cdr.
And empty lists represent empty collections.
Here's the code that puts all of those ideas together.
Selecting an Implementation
The implementations of queues and priority queues is a bit more complex,
so we will leave them until later.
Suppose we've implemented stacks, queues, and priority queues.
How do we decide which our program uses? If a procedure requires
a particular kind of restricted access collection, we use those names
(e.g., stack.add) rather than the generic
names (e.g., collection.add). If we want to
use the generic names, so that we can experiment with the strengths
and weaknesses of various implementations, we can simply tell Scheme
to treat things as equivalent. For example, if we want to use the
stack implementation, we might write
(define collection.new stack.new)
(define collection.get stack.get)
(define collection.drop stack.drop)
(define collection.add stack.add)
(define collection.empty? stack.empty?)
In the lab, you'll use this technique to explore each kind of
restricted access collection.