Lists Top Comparing Recursion and Looping Contents

Recursion

In the chapter on conditionals, we learned about if statements. When we combine if statements with functions that call themselves, we obtain a powerful programming paradigm called recursion.

Recursion is a form of looping; when we loop, we evaluate code over and over again. We use a conditional to decide whether to continue the loop or to stop. Recursive loops are often easier to write and understand, as compared to the iterative loops such as whiles and fors, which you learned about in a previous chapter. In some programming languages, iterative loops are preferred as they use much less computer memory and are slightly faster as compared to recursive loops. In other languages, this is not the case at all. In general, there is no reason to prefer iterative loops over recursive ones, other than this memory issue and slight loss of performance. Any iterative loop can be written as a recursion and any recursion can be written as an iterative loop. Use a recursion if that makes the implementation more clear, otherwise, use an iterative loop.

Many mathematical functions are easy to implement recursively, so we will start there. Recall that the factorial of a number n is:

        [Equation 1]      n! = n * (n - 1) * (n - 2) * ... * 2 * 1

Consider writing a function which computes the factorial of a positive integer. For example, if the function were passed the value of 4, it should return the value of 24 since 4! is 4*3*2*1 or 24. To apply recursion to solve this problem or any problem, for that matter, it must be possible to state the solution of a problem so that it references a simpler version of the problem. For factorial, the factorial of a number can be stated in terms of a simpler factorial. Consider the two equations below:

        [Equation 2]      n! = 1 if n == 0
        [Equation 3]      n! = n * (n - 1)! if n > 0

Equation 2 says that the factorial of zero is one38. Equation 3 states that the factorial of any other (positive) number is obtained by multiplying the number by the factorial of one less than that number. Together, these two equations form a recurrence equation that describes the computation of factorial.

After some study, you should be able to see that this new way of describing factorial is equivalent to Equation 1, the one that that used ellipses39. Equation 1, gets the basic idea of factorial across but is not very precise. For example, how would you compute the factorial of three using Equation 1?

The second form with the two equations is particularly well suited for implementation as a function in a computer program:

    def factorial(n):
        if (n == 0):
            return 1
        else:
            return n * factorial(n - 1)

Note how the factorial function precisely implements the recurrence equation. Convince yourself that the function really works by tracing the function call:

    factorial(3)

Decomposing the call, we find that:

    factorial(3) is 3 * factorial(2)

since n, having a value of 3, is not equal to 0. Thus the second block of the if is evaluated and we can replace n with 3 and n-1 with 2. Likewise, we can replace factorial(2) by 2 * factorial(1), yielding:

    factorial(3) is 3 * 2 * factorial(1)

since n, now having a value of 2, is still not zero. Continuing along this vein, we can replace factorial(1) by 1 * factorial(0), yielding:

    factorial(3) is 3 * 2 * 1 * factorial(0)

Now in this last call to factorial, n does have a value of zero, so we can replace factorial(0) with its immediate return value of one:

    factorial(3) is 3 * 2 * 1 * 1

Thus, factorial(3) has a value of six:

    >>> factorial(3)
    6

as expected.

In contrast, here is a function which computes factorial using a for loop:

    def factorial(n):
        total = 1;
        for i in range(1,n+1):
           total = total * i
        return total

We can see from this version that factorial is an accumulation. This version also more closely follows Equation 1. Note that we have to extend the upper end of the range by one in order to get n included in the accumulation, since the upper limit of the range function is exclusive.

The parts of a recursive function

Recursive approaches rely on the fact that it is usually simpler to solve a smaller problem than a larger one. In the factorial problem, trying to find the factorial of n - 1 is a little bit simpler40 than finding the factorial of n. If finding the factorial of n - 1 is still too hard to solve easily, then find the factorial of n - 2 and so on until we find a case where the solution is dead easy. With regards to factorial, this is when n is equal to zero. The easy-to-solve code (and the values that get you there) is known as the base case. The find-the-solution-using-a-simpler-problem code (and the values that get you there) is known as the recursive case. The recursive case usually contains a call to the very function being executed. This call is known as a recursive call.

Most well-formed recursive functions are composed of at least one base case and at least one recursive case.

The greatest common divisor

Consider finding the greatest common divisor, the gcd, of two numbers. For example, the gcd of 30 and 70 is 10, since 10 is the largest number that divides both 30 and 70 evenly. The ancient Greek philosopher Euclid devised a solution for this problem that involves repeated division. The first division divides the two numbers in question, saving the remainder. Now the divisor becomes the dividend and the remainder becomes the divisor. This process is repeated until the remainder becomes zero. At that point, the current divisor is the gcd. We can specify this as a recurrence equation, with this last bit about the remainder becoming zero as our base case:

gcd(a,b) is b if a divided by b has a remainder of zero
gcd(a,b) is gcd(b,a % b) otherwise

In this recurrence equation, a and b are the dividend and the divisor, respectively. Recall that the modulus operator % returns the remainder. Using the recurrence equation as a guide, we can easily implement a function for computing the gcd of two numbers.

    def gcd(dividend,divisor):
        if (dividend % divisor == 0):
            return divisor
        else:
            return gcd(divisor,dividend % divisor)

Note that in our implementation of gcd, we used more descriptive variables than a and b. We can improve this function further, by noting that the remainder is computed twice, once for the base case and once again to make the recursive call. Rather than compute it twice, we compute it straight off and save it in an aptly named variable:

    def gcd(dividend,divisor):
        remainder = dividend % divisor
        if (remainder == 0):
            return divisor
        else:
            return gcd(divisor,remainder)

Look at how the recursive version turns the divisor into the dividend by passing divisor as the first argument in the recursive call. By the same token, remainder becomes divisor by nature of being the second argument in the recursive call. To convince one's self that the routine really works, one can modify gcd to "visualize" the arguments. On simple way of visualizing the action of a function is to add a print statement:

    def gcd(dividend,divisor):
        remainder = dividend % divisor
        print("gcd:",dividend,divisor,remainder)
        if (remainder == 0):
            return divisor
        else:
            return gcd(divisor,remainder)

After doing so, we get the following output:

    >>> gcd(66,42)
    gcd: 66 42 24
    gcd: 42 24 18
    gcd: 24 18 6
    gcd: 18 6 0
    6

Note, how the first remainder, 24, keeps shifting to the left. In the first recursive call, the remainder becomes divisor, so the 24 shifts one spot to the left. On the second recursive call, the current divisor, which is 24, becomes the dividend, so the 24 shifts once again to the left.

We can also write a iterative version of gcd:

    def gcd(dividend,divisor):
        while (divisor != 0):
            print(divisor,dividend)
            temp = dividend % divisor
            dividend = divisor
            divisor = temp
        return dividend

While the iterative version of factorial was only slightly more complicated than the recursive version, with gcd, we start to see more of an advantage using the recursive formulation. For instance, where did the temp variable come from and why is it necessary41?

The Fibonacci sequence

A third example of recursion is the computation of the nth Fibonacci number. The Fibonacci series looks like this:

    n            :  0   1   2   3   4   5   6   7   8    ...
    Fibonacci(n) :  0   1   1   2   3   5   8  13  21    ...

and is found in nature again and again42. From this table, we can see that the 7th Fibonacci number is 13. In general, a Fibonacci number is equal to the sum of the previous two Fibonacci numbers. The exceptions are the zeroth and the first Fibonacci numbers which are equal to 0 and 1 respectively. Voila! The recurrence case and the two base cases have jumped right out at us! Here, then, is a recurrence equation which describes the computation of the nth Fibonacci number.

fib(n) is 0 if n is zero
fib(n) is 1 if n is one
fib(n) is fib(n - 1) + fib(n - 2) otherwise

Again, it is straightforward to convert the recurrence equation into a working function:

    # compute the nth Fibonacci number
    # n must be non-negative!
    
    def fibonacci(n):
        if (n == 0):
            return 0
        elif (n == 1):
            return 1
        else:
            return fibonacci(n-1) + fibonacci(n-2)

Our implementation is straightforward and elegant. Unfortunately, it's horribly inefficient in Python. Unlike our recursive version of factorial which recurred about as many times as the size of the number sent to the function, our recursive version of Fibonacci will recur many, many more times than the size of its input. Here's why.

Consider the call to fib(6). Tracing all the recursive calls to fib, we get:

    fib(6) is fib(5) + fib(4)

Replacing fib(5) with fib(4) + fib(3), we get:

    fib(6) is fib(4) + fib(3) + fib(4)

We can already see a problem, we will compute fib(4) twice, once from the original call to fib(6) and again when we try to find fib(5). If we write down all the recursive calls generated by fib(6) with each recursive call indented from the previous, we get a structure that looks like this:

    fib(6)
        fib(5)
            fib(4)
                fib(3)
                    fib(2)
                        fib(1)
                        fib(0)
                    fib(1)
                fib(2)
                    fib(1)
                    fib(0)
            fib(3)
                fib(2)
                    fib(1)
                    fib(0)
                fib(1)
        fib(4)
            fib(3)
                fib(2)
                    fib(1)
                    fib(0)
                fib(1)
            fib(2)
                fib(1)
                fib(0)

We would expect, based on how the Fibonacci sequence is generated, to take about six "steps" to calculate fib(6). Instead, ultimately there were 13 calls43 to either fib(1) or fib(0). There was a tremendous amount of duplicated and, therefore, wasted effort. An important part of Computer Science is how to reduce the wasted effort. One way to keep the recursive nature without the penalty of redundant computations is to cache previously computed values. Another way is to use an iterative loop:

    def fib(n):
        previous = 0
        current = 1
        for i in range(0,n,1):
            temp = previous
            previous = current
            current = previous + temp
        return previous

Here, the recursive version, for all its faults, is much, much easier to understand. Complications of the iterative version include, why is previous returned and not current? And where did the variable temp come from? In the chapter that compares recursion to loops, we will see a way to combine the clarity of recursion with the efficieny of loops.

Manipulating arrays and lists with recursion

Recursion, arrays, and lists go hand-in-hand. What follows are a number of recursive patterns involving arrays and lists that you should be able to recognize and implement.

For the following discussion, assume the ahead function returns the first item of the given array, while the atail function returns an array composed of all the items in the given array except for the first element. If the array is empty, it will have a value of [].

By the way, the ahead and atail functions are easily implemented in Python:

    def ahead(items): return items[0]
    def atail(items): return items[1:]  #slicing, which copies!

Note that the ahead and atail functions are analogous to the ListHead and ListTail functions of the previous chapter on lists. Note also that we will use the term collection to stand in for either a list or an array.

The counting pattern

The counting pattern is used to count the number of items in a collection. If the collection is empty, then its count of items is zero. The following function counts and ultimately returns the number of items in an array:

    def count(items):
        if (items == []): # base case
            return 0
        else:
            return 1 + count(atail(items))

The functions works on the observation that if you count the number of items in the tail of a array, then the number of items in the entire array is one plus that number. The extra one accounts for the head item that was not counted when the tail was counted.

The list version is similar:

    def count(items):
        if (items == None): # base case
            return 0
        else:
            return 1 + count(tail(items))

The only code difference in the list version are the use of None instead of [] to signify empty and tail instead of atail. However, the performance difference between the two versions can be huge! For a list or an array with n items, the list version will run n times faster. So if your collection holds a million items, the array version will work on the order of one million times slower. This is due to how atail works versus how tail works, as discussed in the previous chapter. For small numbers of items, however, the array version will run quickly enough.

The accumulate pattern

The accumulate pattern is used to combine all the values in a collection. The following function performs a summation of the list values:

    def sum(items):
        if (items == None): #base case
            return 0
        else:
            return head(items) + sum(tail(items))

Note that the only difference between the count function and the sum function is the recursive case adds in the value of the head item, rather than just counting the head item. That the functions count and sum look similar is no coincidence. In fact, most recursive functions, especially those working on collections, look very similar to one another.

The filtered-count and filtered-accumulate patterns

A variation on the counting and accumulate patterns involves filtering. When filtering, we use an additional if statement to decide whether or not we should count the item, or in the case of accumulating, whether or not the item ends up in the accumulation.

Suppose we wish to count the number of even items in a list:

    def countEvens(items):
        if (items == None): #base case
            return 0
        elif (head(items) % 2 == 0):
            return 1 + countEvens(tail(items))
        else:
            return 0 + countEvens(tail(items))

The base case states that there are zero even numbers in an empty list. The first recursive case simply counts the head item if it is even and so adds 1 to the count of even items in the remainder of the list. The second recursive case does not count the head item (because it is not even) and so adds in a 0 to the count of the remaining items. Of course, the last return would almost always be written as:

    return countEvens(tail(items))

As another example of filtered counting, we can pass in a value and count how many times that value occurs:

    def occurrences(target,items):
        if (items == None):
            return 0
        elif (head(items) == target):
            return 1 + occurrences(target,tail(items))
        else:
            return occurrences(target,tail(items))

An example of a filtered-accumulation would be to sum the even-numbered integers in a list:

    def sumEvens(items):
        if (items == None):
            return 0
        elif (isEven(head(items))):
            return head(items) + sumEvens(tail(items))
        else:
            return sumEvens(tail(items))

where the isEven function is defined as:

    def isEven(x):
        return x % 2 == 0

The filter pattern

A special case of a filtered-accumulation is called filter. Instead of summing the filtered items (for example), we collect the filtered items into a collection. The new collection is said to be a reduction of the original collection.

Suppose we wish to extract the even numbers from a list. The structure of the code looks very much like the sumEvens function in the previous section, but instead of adding in the desired item, we join the item to the reduction of the tail of the array:

    def extractEvens(items):
        if (items == None):
            return None
        elif (isEven(head(items))):
            return join(head(items),extractEvens(tail(items)))
        else:
            return extractEvens(tail(items))

Given a list of integers, extractEvens returns a (possibly empty) list of the even numbers:

    >>> a = ListCreate(4,2,5,2,7,0,8,3,7)
    >>> extractEvens(a)
    [4, [2, [2, [0, [8, None]]]]]

    >>> b = ListCreate(1,3,5,7,9)
    >>> extractEvens(b)
    []

We see again that our list representation depends on arrays.

The map pattern

Mapping is a task closely coupled with that of reduction, but rather than collecting certain items, as with the filter pattern, we collect all the items. As we collect, however, we transform each item as we collect it. The basic pattern looks like this:

    def map(f,items):
        if (items == None): 
            return None
        else:
            return join(f(head(items)),map(f,tail(items)))

Here, function f is used to transform each item in the recursive step.

Suppose we wish to subtract one from each element in an array. First we need a transforming function that reduces its argument by one:

    def decrement(x): return x - 1

Now we can "map" the decrement function over an array of numbers:

    >>> a = ListCreate(4,3,7,2,4,3,1)
    >>> map(decrement,a)
    [3, [2, [6, [1, [3, [2, [0, None]]]]]]]

Both map and filtering (reduce) are used heavily by Google in programming their search strategies.

The search pattern

The search pattern is a slight variation of filtered-counting. Suppose we wish to see if a value is present in a list. We can use a filtered-counting approach and if the count is greater than zero, we know that the item was indeed in the list.

    def find(target,items):
        return occurrences(target,items) > 0

In this case, occurrences helps find do its job. We call such functions, naturally, helper functions. We can improve the efficiency of find by having it perform the search, but short-circuiting the search once the target item is found. We do this by turning the first recursive case into a second base case:

    def find(target,items):
        if (items == None):
            return False
        elif (head(items) == target):    # short-circuit!
            return True
        else:
            return find(target,tail(items))

When the list is empty, we return false because if the item had been list, we would have hit the second base case (and returned true) before hitting the first. If neither base case hits, we simple search the remainder of the list (the recursive case). If the second base case never hits, the first base case eventually will.

The shuffle pattern

Sometimes, we wish to combine two lists. This is easy to do with the append operator:

   append(list1,list2)

This places the first element in the second list after the last element in the first list. However, many times we wish to intersperse the elements from the first list with the elements in the second list. This is known as a shuffle, so named since it is similar to shuffling a deck of cards. When a deck of cards is shuffled, the deck is divided in two halves (one half is akin to the first list and the other half is akin to the second list). Next the two halves are interleaved back into a single deck (akin to the resulting third list). Note that while appending is destructive, in that it changes the first list, shuffling is non-destructive, in the neither of the shuffled lists are modified.

We can use recursion to shuffle two lists. If both lists are exactly the same length, the recursive function is easy to implement using the accumulate pattern:

    def shuffle(list1,list2):
        if (list1 == None):
            return None
        else:
            rest = shuffle(tail(list1),tail(list2))
            return join(head(list1),join(head(list2),rest))

If list1 is empty (which means list2 is empty since they have the same number of elements), the function returns the empty, since shuffling nothing together yields nothing. Otherwise, we shuffle the tails of the lists (the result is stored in the temporary variable rest), then join the the first elements of each list to the shuffled remainders.

If you have ever shuffled a deck of cards, you will know that it is rare for the deck to be split exactly in half prior to the shuffle. Can we amend our shuffle function to deal with this problem? Indeed, we can, by simply placing the extra cards (list items) at the end of the shuffle. We don't know which list (list1 or list2) will go empty first, so we test for each list becoming empty in turn:

    def shuffle2(list1,list2):
        if (list1 == None):
            return list2
        elif (list2 == None):
            return list1
        else:
            rest = shuffle2(tail(list1),tail(list2))
            return join(head(list1),join(head(list2),rest))

If either list is empty, we return the other. Only if both are not empty do we execute the recursive case.

One can make shuffling even simpler by manipulating the recursive call that shuffles the remainder of the lists. If we flip the order of the lists in the recursive call, we only need to deal with the first list becoming empty:

    def shuffle3(list1,list2):
        if (list1 == None):
            return list2
        else:
            return join(head(list1),shuffle3(list2,tail(list1)))

Note that in the recursive call, we take the tail of list1 since we joined the head of list1 to the resulting shuffle. The base case returns list2 because if list1 is empty, the "shuffle" of the remaining elements is just list2. Also, even if list1 and list2 are both empty, the base case test returns the correct value, None.

Finally, note how much simpler it is to shuffle two lists using recursion as compared to shuffling two arrays with loops.

The merge pattern

With the shuffle pattern, we always took the head elements from both lists at each step in the shuffling process. Sometimes, we wish to place a constraint of the choice of elements. For example, suppose the two lists to be combined are sorted and we wish the resulting list to be sorted as well. The following example shows that shuffling does not always work:

    >>> a = ListCreate(1,4,6,7,8,11)
    >>> b = ListCreate(2,3,5,9)

    >>> c = shuffle3(a,b)
    [1, [2, [4, [3, [6, [5, [7, [9, [8, [11, None]]]]]]]]]]

The merge pattern is used to ensure the resulting list is sorted and is based upon the filtered-accumulate pattern. We only accumulate an item if it is the smallest item in the two lists:

    def merge(list1,list2):
        if (list1 == None):
            return list2
        elif (list2 == None):
            return list1
        elif (head(list1) < head(list2)):
            return join(head(list1),merge(tail(list1),list2))
        else:
            return join(head(list2),merge(list1,tail(list2)))

As with shuffle2, we don't know which list will become empty first, so we check both in turn.

In the first recursive case, the first element of the first list is smaller than the first element of the second list. So we accumulate the first element of the first list and recur, sending the tail of the first list because we have used/accumulated the head of that list. The second list we pass unmodified, since we did not use/accumulate an element from the second list.

In the second recursive case, we implement the symmetric version of the first recursive case, focusing on the second list rather than the first.

The generic merge pattern

The merge function in the previous section hard-wired the comparison operator to <. Many times, the elements of the lists to be merged cannot be compared with < or, if they can, a different operator, such as >, might be desired. The generic merge solves this problem by allowing the caller to pass in a comparison function as a third argument:

    def genericMerge(list1,list2,pred):

where pred is the formal parameter that holds a predicate function44. Now we replace the < in merge with a call to pred.

    def genericMerge(list1,list2,pred):
        if (list1 == None):
            return list2
        elif (list2 == None):
            return list1
        elif (pred(head(list1),head(list2))):
            return join(head(list1),genericMerge(tail(list1),list2,pred))
        else:
            return join(head(list2),genericMerge(list1,tail(list2),pred))

The pred function, which is passed the two head elements, returns True, if the first element should be accumulated, and False, otherwise.

We can still use genericMerge to merge two sorted lists of numbers (which can be compared with <) by using the operator module. The operator module provides function forms of the operators +, -, <, and so on.

    >>> import operator
    >>> a = ListCreate(1,4,6,7,8,11)
    >>> b = ListCreate(2,3,5,9)

    >>> c = genericMerge(a,b,operator.lt)
    [1, [2, [3, [4, [5, [6, [7, [8, [9, [11, None]]]]]]]]]]

The genericMerge function is a generalization of merge. When we generalize a function, we modify it so it can do what it did before plus new things that it could not. Here, we can still have it (genericMerge) do what it (merge) did before, by passing in the less than comparison operator. We can use it to merge two lists sorted in increasing order by passing in the greater than comparison operator.

The fossilized pattern

If a recursive function mistakenly never makes the problem smaller, the problems is said to be fossilized. Without ever smaller problems, the base case is never reached and the function recurs45 forever. This condition is known as an infinite recursive loop. Here is an example:

    def factorial(n):
        if (n == 0):
            return 1
        else:
            return n * factorial(n)

Since factorial is solving the same problem over and over, n never gets smaller so it never reaches zero. Fossilizing the problem is a common error made by both novice and expert programmers alike.

The bottomless pattern

Related to the fossilized pattern is the bottomless pattern. With the bottomless pattern, the problem gets smaller, but the base case is never reached. Here is a function that attempts to divide a positive number by two, by seeing how many times you can subtract two from the number:46

    def div2(n):
        if (n == 0):
            return 0
        else:
            return 1 + div2(n - 2)

Things work great for a while:

    >>> div2(16)
    8

    >>> div2(6)
    3

    >>> div2(134)
    67

But then, something goes terribly wrong:

    >>> div2(7)
    RuntimeError: maximum recursion depth exceeded in cmp

What happened? To see, let's visualize our function, as we did with the gcd function previously, by adding a print statement:

    def div2(n):
        print("div2: n is",n)
        if (n == 0):
            return 0
        else:
            return 1 + div2(n - 2)

Now every time the function is called, both originally and recursively, we can see how the value of n is changing:

    >>div2(7)
    div2: n is 7
    div2: n is 5
    div2: n is 3
    div2: n is 1
    div2: n is -1
    div2: n is -3
    div2: n is -5
    div2: n is -7
    ...
    RuntimeError: maximum recursion depth exceeded in cmp

Now we can see why things went wrong, the value of n skipped over the value of zero and just kept on going. The solution is to change the base case to catch odd (and even) numbers:

    def div2(n):
        if (n < 2):
            return 0
        else:
            return 1 + div2(n - 2)

Remember, when you see a recursion depth exceeded error, you likely have implemented either the fossilized or the bottomless pattern.

Code

For Linux and Mac users, the code from this chapter can be found here.

For Windows users, look here (you will need to save the file as recursion.py).

lusth@cs.ua.edu


Lists Top Comparing Recursion and Looping Contents