Lists Top RecursionLists and Loops Contents

Lists and Loops

You can download the functions defined or used in this chapter with the following commands:

    wget troll.cs.ua.edu/ACP-C/listLoop.c     #functions from this chapter
    wget troll.cs.ua.edu/ACP-C/listLoop.h

These files will help you run the test code listed in this chapter. You will also need to download the ilist module.

List/Loop patterns

For every array pattern, there is a corresponding list pattern. Sometimes, the pattern is easier to implement with an array than a loop. Sometimes, the opposite is true. The next sections will detail patterns for lists.

The append pattern

Adding values to the front of a list is known as prepending, so join36 prepends a value to a list. Instead of adding new values to the front, we can also add values to the back of the list. Such a procedure is termed appending; Since lists at this point are nodes, we can append a list with either a single node or a list. In this section, we will focus an appending a list with another list:

    void
    append(Node *listA,Node *listB) //listA length must be > 0
        {
        while (tail(listA) != 0)
            {
            listA = tail(listA); //move to the next node in the list
            }
        setTail(listA,listB);
        return;
        }

Here, we start out with the formal parameter listA pointing to the first node in the list passed as listA. Then, as long as listA's next pointer points to a node and not null37, we set listA to the next node in the list by taking the tail. We repeat this process until we reach the last node in the list, whose next pointer does indeed point to null. At this point, we drop out of the loop and set the next pointer of this last node to listB via setTail. Suppose we call append with two lists, alpha and beta:

    append(alpha,beta);

Here is the view just before append walks the first list:

Immediately after dropping out of the loop, the local variable listA points to the last node in the list:

After the reassingment of the tail pointer of the adjusted listA, we can see that alpha now incorporates the nodes of beta:

Note that while head and tail are non-destructive operations, append is a destructive operation, since one pointer in an original list is actually changed. Generally, you can quickly tell if a function is destructive or non-destructive by looking at the return value. If the function implements the procedure pattern, it is likely destructive. If it implements the function pattern, it is likely non-destructive38.

To test append, we need to create some lists:

    //test (compile with ilist.c and listLoop.c)
    #include "listLoop.h"
    Node *a,*b;
    a = join(2,join(4,join(6,0)));
    b = join(1,join(3,join(5,0)));
    displayList(a);
    displayList(b);
    append(a,b);
    displayList(a);

The output for this program is:

    {2}{4}{6}
    {1}{3}{5}
    {2}{4}{6}{1}{3}{5}

The last line in the output comes from the last displayIntegerList. This clearly shows that append is doing its job.

If we wish to append a single value to a list, we can turn the single value into a list first:

    //test (compile with ilist.c and listLoop.c)
    #include "listLoop.h"
    Node *a,*b;
    a = join(2,join(4,join(6,0)));
    b = newNode(1,0);
    append(a,join(b,0)); //incorporate b into a list
    displayList(a);

The process of moving from one node to the next in a list is known as walking the list. Obviously, the longer the list, the longer it takes to walk it. Thus, appending to a list can take much longer than prepending to the same list, since prepending takes the same amount of time regardless of how long the list is. However, as you will learn in a subsequent data structures class, there is a much faster way to append to a list. If you cannot wait to find out this clever implementation of append, search for the phrase "linked list tail pointer" on the interwebs.

We can use the idea of walking a list to implement some more useful list functions. The getListIndex function returns the value at the ith node in a given list:

    int
    getListIndex(Node *items,int index)
        {
        while (index > 0)
            {
            items = tail(items);
            --index;
            }
        return head(items);
        }

Note that taking the tail of a list does not alter the original list in any way; the list that is passed in is unchanged by this function39. Finally, consider if there are fewer nodes in the list than is specified by the index. This implementation, like many C built-in functions, does no error checking. What happens if the list is too short for the given index is undefined (but it will likely be very bad). The analogous setListIndex is left as an exercise.

The search pattern

This idea of walking lists is an extremely important concept; so important that we will review the two "walks" that are commonly needed when manipulating lists. The first is walking to a certain location in list. Usually, one walks to index (covered previously) or to a value. Of course, walking to a value is simply the search pattern, implemented for lists. Here is a search function that works on Integer lists:

    Node *
    findInteger(int value,Node *items)
        {
        while (items != 0 && head(items) != value)
            {
            items = tail(items);
            }
        return items;
        }

Note that this version returns the node containing the value if the value is found and the null pointer otherwise. Because of the possibility that a value may not be in the list, we must add the condition items != 0 to the while loop test. If we didn't, then items would eventually reach 0 and the head operation on the null pointer would likely generate a crash. Sometimes, find would be written with a simpler loop test, but complicated by a return from the body of the loop:

    Node *
    findInteger2(int value,Node *items)
        {
        while (items != 0)
            {
            if (head(items) == value)
                 {
                 return items;
                 }
            items = tail(items);
            }
        return 0;
        }

These two implementations are semantically equivalent; the former is usually preferred stylistically, but the latter is likely more common.

The trailing walk pattern

The second walk one is likely to encounter is a walk to the next-to-the-last node in a list, which is useful in many situations: removing the last item, adding an item just prior to the last item, and so on.

One must walk the list to get to the penultimate node. Here is one attempt; it keeps two pointers when walking the list, a leading pointer and a trailing pointer that stays one node behind the leader. The walk ends when the next pointer of the leading node becomes null. We call this pattern the trailing walk pattern:

    Node *
    getPenultimateNode(Node *items)
        {
        Node *trailing = 0;
        Node *leading = items;
        while (tail(leading) != 0) //when to stop walking
            {
            trailing = leading;
            leading = tail(leading);
            }
        return trailing;
        }

If we walked until the lead node became null, then the trailing node would be the last node in the list, not the next-to-the-last node. However, checking the next pointer of a node is a dangerous proposition. What happens in the case of an empty list? How many nodes must be in a list for this function to work properly? Highlight the following line to see the answers:

[ When passed an empty list, the tail operation in the loop test will generate an error. Therefore, there must be at least one node in the list.]

The above approach can be simplified by checking if the trailing pointer's next pointer points to a node whose next pointer is null. Although the check is a bit more complicated, it obviates the need for the leading node:

    Node *
    getPenultimateNode2(Node *items)
        {
        Node *trailing = items;
        while (tail(tail(trailing)) != 0)
            {
            trailing = tail(trailing);
            }
        return trailing;
        }

The two versions of getPenultimateNode are similar, but not exactly equivalent. How many nodes must be in a list for the second version to work properly? Highlight the following line to see the answers:

[ There must be at least two nodes in the list.]

To test these implementations, we will create a list and then print the value of the penultimate node:

    //test (compile with ilist.c and listLoop.c)
    #include "listLoop.h"
    Node *a,*n,*m;
    a = join(newInteger(2),join(newInteger(4),join(newInteger(6),0)));
    n = getPenultimateNode(a);
    printf("n is %d\n",getIntegerValue(head(n)));
    m = getPenultimateNode2(a);
    printf("m is %d\n",getIntegerValue(head(m)));

Both version produce 4, as expected.

An application of the trailing walk pattern is to insert a value into an ordered list so that the list remains ordered40. For example, suppose we have an ordered list consisting of the numbers 1, 3, 8, 14, and 17:

If we wish to insert 10 into the list, we must place the 10 between the 8 and the 13 so that the list values remain in increasing order (from left to right). To simplify the making of an integer Using getPenultimateNode as a starting point, we have:

    void
    insertIntegerInOrder(int value,Node *items)
        {
        Node *trailing = 0;
        Node *leading = items;
        while (...) //when to keep walking
            {
            trailing = leading;
            leading = tail(leading);
            }
        //insert new value in between trailing and leading
        ...
        }

The ellipses mark where changes to the trailing value pattern need to be made. The first ellipsis (the while test) should force the walk to stop when the correct insertion point is found. Clearly, that is when leading value is greater than the value to be inserted. The second ellipsis concerns how the actual insertion is to be performed. We know we will need to:

All that's left then is to fill in the blanks. Here is the result:

    void
    insertInOrder(int value,Node *items)
        {
        Node *trailing = 0;
        Node *leading = items;
        while (head(leading) < value) //when to keep walking
            {
            trailing = leading;
            leading = tail(leading);
            }
        //insert new value in between trailing and leading
        Node *n = newNode(value,0);
        setTail(n,leading);     //place n before leading
        setTail(trailing,n);    //place n after trailing
        }

Note that we want to stop when the leading value is greater than the value to be inserted. Since we are working with a while loop, we need to reverse the logic so that the walk continues as long as the leading value is less than the new value41.

Let's test our code:

    //test (compile with ilist.c and listLoop.c)
    #include "listLoop.h"
    int numbers[] = {1,3,8,13,14,17};
    Node *a = arrayToList(numbers,sizeof(numbers)/sizeof(int));
    insertInOrder(10,a);
    displayList(a);

Compiling and running this code yields:

    {1}{3}{8}{10}{13}{14}{17}

It appears our code works correctly! The 10 is nestled nicely between the 8 and the 13. Very exciting! Unfortunately, the excitement of getting code to work seduces both novice and experienced Computer Scientists alike into thinking the task is finished. Many times, including this case in particular, initial success only means the basic idea is correct, not that the code is correct for all possible inputs. Indeed, for our implementation of insertOrder, there are edge cases for which this code fails. An edge case is a set of inputs that force the code to do as much (or as little) work as possible. For insertInOrder, what inputs causes the function to do as much work as possible? The only place where a variable amount of work is performed is the loop that walks the list. In order to make the loop go as many times as possible, it is clear that we would need to insert a value at the very end of the list. Changing the line:

    insertInOrder(20,a);    //was: insertInOrder(10,a);

yields the following result when the program is run:

    Segmentation fault      (core dumped) ./a.out

Likewise, making the loop do as little work as possible yields errors in the code. What inputs would cause the loop run very little or not at all? Obviously, if the value to be inserted is smaller than any value in the list, the loop does not need to be run. Less obviously, what if the given list is empty? Fixing insertInOrder to handle these edge cases is left as an exercise.

The counting pattern

The counting pattern is very important for lists, since we can use that pattern to deduce the number of items in a list:

    int
    length(Node *items)
        {
        int count = 0;
        while (items != 0)                //typical list walking condition
            {
            ++count;
            items = tail(items);          //typical list walking step
            }
        }

Traditionally, the function that counts the number of items in a list is called length. As with the counting pattern for arrays, each "step" taken results in a counter being incremented.

The filtered-accumulation pattern

A filtered accumulation looks similar to the counting pattern. Here is a function that totals up all the even values in a list:

    int
    sumEvenIntegers(Node *items)
        {
        int total = 0;
        while (items != 0)
            {
            int v = head(items);
            if (v % 2 == 0)
                total += v;
            items = tail(items);
            }
        return total;
        }

The extreme pattern

Once you have the hang of walking a list, most patterns become trivial to implement. Here is an instance of the extreme pattern:

    int
    getLargest(Node *items)
        {
        int largest = head(items); //assume first is largest
        while (items != 0)
            {
            int v = head(items);
            if (v > largest)
                largest = v;
            items = tail(items);
            }
        return largest;
        }

This implementation looks very much like the extreme pattern for arrays, only translated to list walking.

Surprisingly or, maybe, not surprisingly, the extreme index pattern is much more difficult to implement for lists than for arrays. This is because the basic walk of a list ignores the index of a particular node. In general, any processing that requires the manipulation of indices will be more complicated with a list as compared to an array. The implementation of the extreme index pattern is left as an exercise.

The copy, filter and map patterns

Lists lend themselves to non-destructive copying, filtering, and mapping. As the original list is walked and values processed, a new list holding the original, filtered, or mapped values is built up. One slight problem arises when loops are used to walk the list. Since lists are grown using join, the elements in the new list are found in the opposite order as compared to their associated elements in the original list. If the order of the result doesn't matter, then lists and loops go well together for this kind of processing. If the order does matter, then another way of walking list, using recursion, can be used to preserve order. We will cover recursion in the next chapter.

Here is code which copies a list.

    Node *
    copyList(Node *items)
        {
        Node *result = 0;   //resulting list starts out empty
        Node *spot = items;
        while (spot != 0)           //walk the list
            {
            result = join(head(spot),result); //add current value to result
            spot = tail(spot);                //take a step
            }
        return result;
        }

In almost all list processing functions, we see the basic walk of the list. Indeed, should you be tasked to define a list processing function, you would be well advised to start with a basic walk and then modify as needed.

Testing the copyList function reveals the reordering of the original values:

    //test (compile with listLoop.c)
    #include "listLoop.h"
    int numbers[] = {1,3,8,13,14,17};
    Node *a = arrayToList(numbers,sizeof(numbers)/sizeof(int));
    Node *b = copyList(a);
    displayList(a);
    displayList(b);

Compiling and running this code yields:

    {1}{3}{8}{13}{14}{17}
    {17}{14}{13}{8}{3}{1}

Filtering and mapping are based off of the copy pattern. Here is a function the filters out odd-valued integers:

    Node *
    extractEvens(Node *items)
        {
        Node *result = 0;
        while (items != 0)
            {
            int v = head(items);
            if (isEven(v))
                result = join(head(items),result);
            items = tail(items);
            }
        return result;
        }

Note how the resulting list is grown every time an even valued item is found in the original list. Testing the function:

    //test (compile with ilist.c and listLoop.c)
    #include "listLoop.h"
    int numbers[] = {1,3,8,13,14,17};
    Node *a = arrayToList(numbers,sizeof(numbers)/sizeof(int));
    Node *b = extractEvenIntegers(a);
    displayIntegerList(b);
    displayIntegerList(b);

yields:

    {1}{3}{8}{13}{14}{17}
    {14}{8}

The even numbers from the original list are found in the resulting list in the opposite order.

The shuffle and merge patterns

As with arrays, shuffling and merging lists using loops is rather complicated. So much so that we will not even bother discussing loop-with-list versions. However, using recursion with lists greatly simplifies this type of processing, as we shall see in the next chapter.

The fossilized pattern

The primary reason for accidentally implementing the fossilized pattern is to forget the while loop update. For example, while this loop:

    while (items != 0)
        {
        //loop body here
        ...
        items = tail(items);      //update, often accidentally omitted
        }

may have been intended, what sometimes happens is the update to items is omitted. Since items is never updated, it never advances to the null pointer (unless, of course, it was an empty list to begin with). This leads to an infinite loop.

Freeing lists

Since our lists are dynamically allocated, we need to free them when we are done with them:

    void
    freeList(Node *items)
        {
        Node *spot;
        while(items != 0)
            {
            spot = items;
            items = tail(items);
            free(spot);                //free the current node
            }
        }

The only complication to this procedure is the need to save the lead node of the list before advancing to the next node. This is because, once we free something, we cannot reference any part of it. Thus, the sequence:

    free(items);            //free the current node
    items = tail(items);    //WRONG!

will end in grief. The second statements asks for a part of the node (the value of the next pointer), after the node has been freed. Note that this routine does not free the values of a node. For integer nodes, this is not an issue. For string-valued nodes, where the strings were malloc-ed, the freeList function would be modified to free those strings:

    void
    freeList(Node *items)
        {
        Node *spot;
        while(items != 0)
            {
            spot = items;
            items = tail(items);
            free(spot->value);
            free(spot);                //free the current node
            }
        }

Why Lists?

You may be asking, why go through all this trouble to implement lists when arrays seem to do every thing lists can do. The reason comes down to the fact that no data structure does all things well. For example, a data structure may be very fast when putting values in, but very slow in taking values out. Another kind of data structure may be very slow putting values in but very fast taking values out. This is why there are quite a number of different data structures, each with their own strengths.

Arrays versus Lists:

Problems

  1. Assume a list holds the values 3, 8 and 10. Draw a box and pointer diagram of this list.
  2. Why does the for loop in the intsToList function (in integers.c):
        for (i = size-1; i >= 0; --i)
    

    decrement the loop index?

  3. Assume a list holds the values 3 and 8. Consider appending the value 5. Draw a set of box and pointer diagrams, similar to those used in the discussion of join, that illustrate the action of append. You should have a drawing of the situation before the while loop begins, one for each update of the variable node while the loop is running, one after the new node is created, and one showing the list just before the function returns. Label your drawings appropriately.
  4. Define a function named appendValue, which takes a list and an int value and appends the value to a list of Integer values. Your function should look very similar to append.
  5. The module lists.c exhibits an important principle in Computer Science, abstraction. The list operations treat nodes as abstract objects; that is to say, lists do not know nor care about how nodes are constructed. Verify this is true by renaming the variables in the Node structure and updating the accessors and mutators of nodes accordingly. Now test the list operations to see that they work exactly as before.
  6. The list operation getListIndex as found in the lists.c module exhibits another important concept in Computer Science, generalization. Since both getListIndex and setListIndex need to walk the list, that list-waking code was generalized into the function getListIndexedNode. Modify the lists.c module so that getListIndex and setListIndex each walk the list. Both operations should function as before, but you should notice how similar they look. When you see such similarity, the similar code is a candidate for generalization.
  7. Define a function named reverse, which takes a list as an argument and returns a new list with the same items as the given list, but in the opposite order.
  8. Define a function named copy2, which takes a list as an argument and returns a copy of the list. The new list should have the same ordering as the original list.
  9. Define a function named chop, which takes a list as an argument and destructively removes the last element of a list. Your function should look similar to getPenultimateNode.
  10. Define a function named equals, which takes two lists as arguments and returns whether or not the two lists have the same elements in the same order.
  11. The linked lists developed in this chapter are known as singly-linked lists, a class of lists with a single pointer links a node to the next node in the list. Another class of lists, doubly-linked lists, provide an additional pointer from a node to its predecessor. Define a constructor for nodes upon which doubly-linked lists could be based.
  12. Fix the given version of insertIntegerInOrder so that it correctly handles edge cases.
  13. Implement a function that finds the index of the largest element in the list. Do not use getListIndex or anything like it.
  14. Consider the version of insertIntegerInOrder given in this chapter. If a value to be inserted matches a value already in the list, does the new value end up before or after the old value? How would you change the code to get the opposite behavior?

lusth@cs.ua.edu


Lists Top RecursionLists and Loops Contents