CSC 153	Grinnell College	Spring, 2009

Computer Science Fundamentals

Lists in C

Summary

This reading outlines the creation, processing, and deletion of lists in C.

Acknowledgement

Most of this reading is an edited version of Henry M. Walker, Computer Science 2: Principles of Software Engineering, Data Types, and Algorithms, Little, Brown, and Company, 1989, Sections 5.1-5.2, with programming examples translated from Pascal to C. This material is used with permission from the copyright holder.

Linked Lists and the Concept of Pointers

The concept of a linked-list structure may be motivated by considering how we might maintain lists containing tasks we need to do. Initially we might write down names of tasks as we think of them. Then we might number these tasks in the order in which we plan to do them. As we think of additional work to do, we might write down each new task (perhaps at the bottom of the paper) and then revise the numbering that indicates the order for doing the work. Such a list is shown below.

5.	Do Homework
1.	Go Swimming
4.	Mow Lawn
2.	Buy Birthday Present
3.	Clean Desk

This table suggests several important characteristics of lists. On a list, one task (Go Swimming) is designated as the first item, and another (Do Homework) is the last. The ordering of the list items is specified by the numbering, not by the physical location of the items. For example, Do Homework appears as the first line of the table, but that task will be done last in the work schedule. On the other hand, Go Swimming appears on the second line of the list, but it will be done first. These observations are highlighted in the following figure, where the list is shown in a different form, one that deemphasizes the physical location of items while stressing their order. With this structure, each item on the list can be reached starting with the first item and following the pointers from one item to the next.

In this figure, separate tasks are not written on different lines on paper; rather, each list item is represented as a box with two parts:

the name of the task for the list entry, and
a place to indicate which box is next (in the diagram, we indicate this next box with an arrow).

In this representation, the Task Name includes alphabetical characters. The arrows for the Next part of the box, however, represent a different type of data, called a pointer, which indicates where other pieces of information are located. In the figure, a box contains both a task name and an arrow or pointer that specifies another box. Arrows do not contain the task names; instead arrows point to boxes, and the boxes contain the task names.

The above figure also illustrates two other features needed in the representation of lists. First, a special pointer is used to specify the location of the initial item on the list. Second, a special symbol indicates the last element on the list because the last box does not have an arrow pointing onward. In the diagram, we drew a diagonal line through the Next part of the box for the last item, Do Homework.

Operations on Linked Lists

In addition to this logical structure, a simple linked list abstract data type involves several operations, including finding and perhaps changing a data item, printing data (either in order or in reverse order), deleting an item, or inserting a new item. Each of these operations conceptually is quite simple, but some details for item deletion and insertion require a bit of care. In what follows, the steps for each operation are illustrated by considering the simple linked list shown in the following figure, where the data consist of the numbers 2, 4, 6, and 8.

Finding a Data Item

When locating a particular data item in a linked list, we start with the first item on the list and then proceed from one item to the next until we find the desired piece of data or until we reach the end of the list.

Outline for Finding a Data Item

Prepare to examine first item.
Continue until data found or no more items remain.
1. Compare the data desired with the data stored in this item.
  1. If the data match, note data found.
  2. If the data do not match, prepare to examine next item.

As an example, following figure shows the steps required to find the number 6 on the list of our previous example.

Locating a Data Item on a List: Locating 6 on the List (2 4 6 8)

Throughout this process, we must be careful to distinguish between pointers and list items. One pointer specifies the first item on the list, and a second pointer identifies the item to be checked next. The second pointer begins at the first item and then moves from item to item as we check data in subsequent items. In all this work, the pointers indicate the location of a specific list item, and checking data requires looking within the item itself.

Printing the Data on a List

Printing the data on a list follows much the same process as finding a particular item except that we keep printing successive items until no more items are present. We do not stop part way through.

Outline for Printing List Data
The following outline for printing follows the same general form as the outline for finding a data item:

Prepare to print data in first item.
Continue until no more items remain.
1. Print the data at the current item.
2. Prepare to examine the next item.

In both of these outlines, processing starts at the first item on the list. We then follow the arrows from one item to the next as we move along the list.

Printing a List in Reverse Order

In contrast, printing the data in reverse order requires a different approach. A simple list contains no separate arrow to the end of the list. In addition, there is no arrow from an item to the preceding one (only to the next). When printing items in reverse order, therefore, we cannot proceed directly from the last to first. Instead, we identify a pattern for processing an item, and we apply this process recursively through the list.

Outline for Printing a List in Reverse Order
For each item we want first to print the rest of the list; then we print the item itself. This yields the following outline, in which step B is recursive:

Start with the first item.
Print the list backward.
1. Print the rest of the list. (Follow this step B, starting with the next item, if any.)
2. Print the first item.

When this algorithm is applied to a list, later elements are always printed before earlier ones because of the placement of step A before step B.2. The list, therefore, must be printed backward. If these steps were reversed in the outline, earlier items would be printed first, giving a recursive algorithm for printing the list on order.

Deleting a Data Item

As with the printing operation, the deletion of an item from a list may be done either iteratively (without recursion) or recursively. The overall strategy for deletion is illustrated in the following figure, where the third item (6) is deleted from the list (2 4 6 8). In the new list, the 4 box no longer specifies that the 6 box comes next; rather, the pointer for the 4 box indicates that the 8 box comes next. With a small change in a pointer, the 6 box is no longer on the list, since we cannot reach that box by starting at the beginning of the list and moving from one item to the next. Even if the box is physically present somewhere, it is lost for all future work because there is no way to locate it. Beyond changing pointers, we also may decide to throwaway the old item that we deleted so that we can use that space again.

This example illustrates the main steps involved in the nonrecursive algorithm for deleting an item from a linked list. The following figure shows these steps more carefully.

As the figure shows, the ideas behind this deletion are fairly simple. When writing an outline to perform this task, however, there are two complications. First, the deletion of the first item in a list requires a special case, since we must now designate a new first item. Second, when we find which item to delete, we still must keep track of the previous item. For example, in the figure, when we locate the 6 box, we must also be able to locate the 4 box.

This second point emphasizes that in moving through linked lists, it is easy to go from any particular list item to later ones by following the arrows. We cannot follow the arrows backward, however, so it is difficult to back up toward the front. Each list item contains information about the next item, but there is no information about previous ones. Thus, to delete a list item, we must explicitly keep track of previous items as we search. From the preceding item, we can move ahead easily to the item we will actually delete.

These comments motivate the following outline for deleting an item from a list.

Outline for Deleting an Item (Without Recursion)
Determine whether item to be deleted appears first on list.

If so,
1. Move first pointer to new first item on list.
2. Throw away the old item.
If not,
1. Find item to be deleted on list, keeping track of the previous item as the search continues.
2. Change the pointer of the previous item to specify the next item.
3. Throw away the old item.

Many of these same ideas also apply to the recursive approach for deleting an item, although the focus changes to possible actions for a single item, when two possibilities may occur. First, the item specified may be the one to be deleted, in which case we follow the ideas of step A above. Alternatively, this may not be the specified item, in which case deletion will occur later in the list. This generates the following algorithm, in which the second step is recursive.

Outline for Deleting an Item Using Recursion

Start with the first item on the list.
Determine whether item to be deleted appears first.
1. If so,
  1. Move the first pointer to the new first item.
  2. Throw away the old item.
2. If not, delete desired item from remainder of list. (Apply this step B to the rest of the list.)

In this revised outline, all details for moving down the list are included in the recursive step B.2, and the resulting outline is somewhat shorter and more straightforward than the nonrecursive version.

Inserting a Data Item

The insertion of an item into a list may be done either iteratively or recursively, and the overall result is illustrated in the following figure, where 5 is inserted into a new third box on the list. In the new list, we created a new box, placed the 5 as the data for this box, made the pointer of the 5 box indicate that the 6 box comes next, and changed the pointer of the 4 box to this new list item. In this insertion process, we can build the 5 box at any convenient place; then we add this box to the list by changing pointers appropriately.

This example suggests the main steps involved in inserting an item into a linked list.

Determine where item will be inserted.
Create a new box.
Place data in the new box.
Make pointer of new box specify the appropriate next element.
Update the pointer in the previous box to specify the newly created box.

As with list deletion, the details for a direct, iterative approach for this insertion include two complications. First, to insert an item at the beginning of a list, we must update the first pointer rather than the pointer in the preceding list item. Second, in finding where to insert the new item, we must identify the item that will precede the new item. In the example, we must know that the 4 box will precede the 5 box. When we keep these details in mind, we can expand the above steps for list insertion into a complete outline.

These same ideas also apply to an alternative approach that focuses on the work required at an individual point on the list. In particular, if we focus on inserting an element before the head of the list, we obtain a recursive solution that eliminates some of these complications.

Implementation of Linked Lists with Pointers

The reading to this point has introduced the concept of linked lists in picture form, with arrows or pointers going from one list item to the next. In this section, we translate this representation of lists into C code using a new data type called pointers. With these C pointers, the pictures of the list operations carry over in a straightforward way into corresponding functions.

As with the conceptualization of pointers and lists in the previous discussion, C is careful to distinguish between the pointers themselves and the item that the pointers specify. In C, a new symbol, the asterick * is used in conjunction with pointers, and this symbol helps us make this distinction. When a pointer specifies a box or struct with multiple fields, C allows us to use an arrow -> as a somewhat cleaner notation. Overall, the discussion of pointers in a program involves several important topics:

pointer type and type declarations,
variable declaration and initialization,
manipulation of pointers and items, and
storage of list items.

To illustrate the discussion, we consider the following problem.

Problem: Maintaining a List of Names

Write a program that will maintain a list of names. In particular, the program should allow us to insert a new name after a specified name, delete a name, and print the list of names.

A C Program with Commentary

The program ~walker/c/lists/namelist.c provides a careful solutin to this problem. The following paragraphs explain this code in some detail.

Declarations

Declarations must specify two types of objects: list items and pointers to list items. Following the discussion of list items above, the list items themselves will contain two parts: data and a pointer to another item. The following declarations define these elements:

   /* Maximum length of names */
   #define strMax 20

   struct node
   {  char data [strMax];
      struct node * next;
   };

   struct node * first;  /* pointer to the first list item */

In this declaration, first designates a pointer to an node; the asterick * specifies a new pointer data type, which is a pointer or arrow to a specified type (node). A node itself is a record with two parts: a data field to hold names and a next field to point to another node. This declaration exactly parallels the concept of a list node as a box with two parts.

After declaring these new types that involve pointers, variable declarations specify variables that will indicate nodes on the list. For example, each of the following variable declarations specifies a new variable that will designate a list element.

Initialization

As we discussed lists in the beginning of this lab, we had to recognize when we reached the end of the list. This was accomplished graphically by placing a line through the Next part of a box.

In C, the value NULL is used to indicate that a pointer does not specify a new node. Using this value, we normally initialize a first pointer to NULL, since the program begins with no items on the list. As names are added, first is updated to point to new nodes. Further, this NULL value allows us to check when we reach the end of the list. When a next field is NULL, we know we have found the last list node.

Manipulation of Pointers and Items

Once a list of names has been built, we may wish to manipulate pointers and items so the list data can be printed.

From the discussion of printing earlier in this lab, we declare another variable listElt that we move along the list as we print. Processing starts by examining the first element on the list. At first, listElt points to the same item as the First pointer, and we use the assignment listElt = first. The assignment statement makes the two pointers indicate the same item. At this point, first and listElt are arrows that point to the first list item (as shown in the first part of the following picture).

To work with the box at the end of an arrow, we add an asterick * before the pointer. Here, listElt is the arrow and *listElt is the item specified by the arrow. The asterick distinguishes between a pointer and an item itself.

Once an item is specified by *listElt, we work with it just as with any other variables already encountered. In this case, *listElt specifies an node that is a struct. To access the data in the struct, we precede a field by a period, so .data specifies the name within the box. Putting this field together with the box specification, (*listElt).data designates the data field inside the box pointed to by the pointer listElt. Since this notation (with parentheses, an asterick, and a period) is a bit cumbersome, C has an alternative notation using an arrow ->, giving rise to listElt->data. To print this field in a program, we state:

   printf ("%s\n", listElt->data);

When we want to move from one box to the next, a similar sequence allows us to update listElt.

pointer to the current box in the list: listElt
the current box itself: *listElt
the next pointer in the current box: listElt->next

Thus, in moving from one list item to the next, we update listElt to the next pointer, listElt->next. This involves the assignment:

   listElt = listElt->next;

This discussion allows us to write the code for printing a list.

This procedure illustrates several important features about using pointers in C. We can use assignment statements to change what a pointer is pointing at. In addition, a NULL value allows us to determine when we come to the end of a list. The obvious test for not NULL in a while loop is:

   while (listElt != NULL)

However, the actual value of NULL in C is 0, and C considers this value as false. Thus, a simpler test within a while loop is

   while (!listElt)

Finally, we can move from a pointer to the item pointed at by adding a asterick * to the variable. This is called dereferencing the pointer. The pointer itself gives a reference to an item. Adding the up asterick * specifies the item itself.

Storage Allocation

Now that we have seen how to move from one list item to another to print list data, we consider the creation, storage, and elimination of list items. Up to now, storage for variables was created in an area called the run-time stack each time functions were called, and this space was freed when the functions finished. Such storage represents static storage allocation; within a function, this storage does not change.

In contrast, items specified by pointers can be created and destroyed within a function. Such storage is called dynamic storage allocation. For example, we must explicitly create some space to add a new item to a list, and we will explicitly dispose of an old item when it is deleted from the list.

The dynamic allocation and deallocation of storage space are performed with two new procedures, malloc and free. To see how these procedures work, suppose listElt is declared as a pointer to a node; then:

   listElt = (struct node *)malloc(sizeof(struct node));

allocates a new box for an item, and the variable listElt points to that new space. Subsequently,

   free (listElt)

deallocates the box pointed to by the variable listElt.

Special Cases for Nonrecursive Insertion and Deletion

To illustrate how insertion and deletion functions are used, we consider two special cases that occur in the name problem.

Case 1: insertion of a new name at the start of a list: The steps required to add an item to the start of a list are shown in the following diagram.

Insertion of a New Name at Start of List

Here a pointer variable newNode first is declared to point to a node. Then the insertion operation itself begins by allocating space for the newNode with the statement.

   newNode = (struct node *)malloc(sizeof(struct node));

We next add the name as data for the item. For example, if reading is done using the scanf function, we might specify

   scanf("%s", newNode->data);

The final step is to update the various arrows. The previous head of the list comes after the newNode, and newNode should appear at the start of the list. This changing of pointers is done with the following code:

   newItem->next = first;
   first = newItem;

Case 2: Deletion of an Item from the Middle of a List. The deletion of an item from the middle of a list involves the steps outlined earlier in this reading. For convenience, we repeat the diagram for deletion here.

First, we search the list to find the item that we wish to delete and the item preceding it on the list. This may be done with the following code:

    /* item to remove is not at beginning of list */
    /* start at beginning of list */
    listPtr = (*firstPtr)->next;  /* the current node to search */
    prevPtr = *firstPtr;          /* the node behind listPtr */

    while (listPtr && (strcmp (name, listPtr->data) != 0)) {
      prevPtr = listPtr;
      listPtr = prevPtr->next;
    }

Here, prevPtr identifies the node that precedes the name on the list given by the *firstPtr pointer. Once the previous element is actually found, listPtr identifies the node to be removed, and prevPtr identifies the node just before that node.

With these cases, the code for insertion and deletion follows the algorithms described earlier in this reading by translating the pictures into C. For the nonrecursive algorithms, special cases arise when the list does not contain any elements or when we must work with the first item on the list.

Simplifications for Recursive Insertion and Deletion

The coding for the recursive insertion and deletion algorithms also follows the outlines rather closely, although the recursion allows some simplifications. For example, the recursive version of the deletion algorithm focuses on the head of the list. If that item is deleted, the head of the list is updated, and the old item is thrown away. If that item is not to be deleted, the same process is applied to the shorter list that starts with the second item. In this setting, the resulting code never has to consider the case in which deletion occurs in the middle of a list.

The recursive deletion procedure, therefore, never needs the prevPtr variable. A similar simplification is possible in the recursive Insertion procedure. Although the same insertion and deletion names can be done either iteratively or recursively, the recursive procedures are considerably shorter in each case.

All of these pieces come together in the program ~walker/c/lists/namelist.c

This document is available on the World Wide Web as

http://www.walker.cs.grinnell.edu/courses/153.sp09/readings/reading-lists-c.shtml

created 15 April 2008 last revised 14 March 2009
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.