![]() |
![]() |
![]() | Contents |
Recall that what we are calling arrays are called lists by the Python community25. Now we will introduce true lists; these lists will be our first attempt at building a data structure. A data structure is simply a bundling of pieces of data together in a single entity along with a set of operations (implemented using functions) to manipulate such entities. Common operations for data structures include putting data into the structure and taking data out. Moreover, the data structure and the functions which implement the operations are constructed so that the operations do not take an inordinate amount of time as more and more data is placed into the structure.
Our lists will use an intermediate data structure known as a node. A node is a simple structure that holds two pieces of data. The first data item is some arbitrary value, while the second is another node. Graphically, nodes are represented as a box with two arrows. The downward pointing arrow typically points to the value while the rightward pointing arrow points to other node.
In actuality, the data items are the addresses in memory where the values and nodes reside; such memory addresses are known as pointers. Thus, the graphical representation of the node is quite accurate, with the arrows "pointing" to the data items. The downward arrow is known as the value pointer and the rightward arrow is known as the next node pointer or simply next pointer.
All data structures are built upon other data structures, so there must be some low-level data structure that is built into the language. Python has two such built-in structures, arrays and objects26. Objects would be a good choice as a structure upon which to build nodes, but you don't know about objects yet, so we will use arrays instead. You will learn more about objects in a later chapter and in subsequent classes.
Each of our nodes will actually be an array of length two. The first slot in the array will hold the value pointer and the second will hold the next pointer. Now that we have our structure, we will need to define a set of operations for nodes. The first one we define is a function to create a node. Such a function is known as a constructor:
def NodeCreate(value,next): #this is the constructor
return [value,next]
Note that the constructor takes two arguments, the value and the next pointer and simply returns an array holding those two values. Next, we define two operations for retrieving the data items stored in the node:
def NodeValue(n):
return n[0]
def NodeNext(n):
return n[1]
The node from which we wish to extract information is passed as an argument in calls to these two functions. The definitions rely on the fact that the value is stored in the first slot and the next pointer is stored in the second slot of the array that constitutes a node. Typically, functions that retrieve values or other information from a data structure are known as accessors or getters.
Finally, we define operations for changing the data in a node:
def NodeSetValue(n,value):
n[0] = value
return
def NodeSetNext(n,next):
n[1] = next
return
Such functions are typically known as mutators or setters, since the change the data inside the structure. Note that they implement the procedure pattern, as they do not compute any new values.
With these definitions in place, we can now make nodes and change
their values.
We will use the value None to indicated that the next pointer of
a node should not be followed. Such pointers-which-must-not-be-followed are generically known as
null pointers.
a = NodeCreate(3,None)
b = NodeCreate(4,None)
print("a's value is",NodeValue(a))
print("b's value is",NodeValue(b))
NodeSetValue(a,"two")
print("a's value now is",NodeValue(a))
Executing this code yields the following output:
a's value is 3
b's value is 4
a's value now is two
Continuing on from the previous bit of code, we can even join nodes together:
NodeSetNext(a,b) # link!
print("b's value, through a is",NodeValue(NodeNext(a)))
print("a is",a)
The first statement sets the next pointer of a to b; in other words, the null pointer of node a was replaced by a pointer to node b. Afterwards this statement, a and b are said to be linked together, as in a chain.
Indeed, a group of nodes linked together via next pointers is known as a linked list. The next pointer of the last node in a list of nodes is always null. The output of the first print statement conforms to expectation; we see the value of the node a's next pointer points to, namely b, is 4:
b's value, through a is 4
The output of the second print statement reveals that nodes are indeed arrays:
a is [2, [4, None]]
In fact, if we replace b's next pointer, currently None, with
a new node and then print a again:
NodeSetNext(b,NodeCreate(6,None))
print("a is",a)
we start to see why nodes can be used to form lists:
a is [2, [4, [6, None]]]
Note that the value 6 can only be reached through a or through b. There is no variable like a or b that points directly to that node.
Now we could stop at this point and just use nodes and their operations to make true lists, but note that the node operators freed us from knowing that nodes were based upon arrays. If you look closely at the above code where we tested our node operations, you will see that there is absolutely no clue that nodes are built from arrays. In fact, we only got an inkling of the internal structure of a node when we printed a node directly using print, which is not a node operator. Likewise, we can build a list data structure based upon nodes and be freed from both knowing lists are built from nodes and that nodes are built from arrays.
The first list operation we will define is the constructor:
def ListCreate(*args):
items = None
for i in range(len(args)-1,-1,-1): # work from right to left
items = join(args[i],items)
return items
This function has a bit of Python voodoo in it. The first
bit is the asterisk before the single formal parameter, arg.
This means that any number of arguments can be passed to
ListCreate and all of them will be bundled up into an array
that is bound to the variable args.
The second bit is the funny looking range(len(args)-1,-1,-1), which
produces the indices of all the elements in args in reverse order.
This causes the elements of args to be joined up in reverse.
The reversal is necessary because the first value that is joined ends up
at the back of items27.
When ListCreate is called with no arguments, the constructor returns an empty list:
>>> ListCreate()
None
Thus an empty list is represented by None.
If we pass a number of arguments, we see output similar
to when we joined a bunch of nodes together:
>>> ListCreate(2,"six",False)
[2, ['six', [False, None]]]
Therefore, ListCreate can take any number of values to populate a newly created list28.
The last bit of new stuff in the body of the ListCreate function is the reference to a join function:
def join(value,list):
return NodeCreate(value,list)
The join function takes a value and a list as arguments and returns a new node that glues the two arguments together. The result is a list one item larger than the list that is passed in.
We can see from the definition of ListCreate that a list
either has the value None, if there were no arguments,
or the result of join, if there were. Since join returns
a node, then a list is actually a node (or None if the list
is empty). You may be wondering at this point why we bother to distinguish
lists and nodes. One reason is that lists
in other programming languages aren't quite so easy to implement
as in Python
and, in those languages, nodes are quite distinct from lists.
Thus, we want you to be ready for those languages.
Another reason is it is important to practice the concept of
abstraction. Modern software systems exhibit a large number of
abstraction layers. With our lists, we can see three layers
of abstractions: arrays, which abstract some underlying machine
code, nodes, which abstract arrays of length two,
and lists,
which abstracts a chain of nodes.
Each level of abstraction frees us from thinking about the
details of the underlying layers.
Let us now look at the join operation more closely. Here is a rewrite that uses a temporary local variable, n, to hold the new node that will join the given value and list together:
def join(value,list):
#before
n = NodeCreate(value,None)
#during
NodeSetNext(n,list)
#after
return n
First, let us create a list to which we will join a value:
a = ListCreate(42)
Graphically, the situation looks like this:
Now, we call join to join the value of 13 to the list a.
a = join(13,a)
At the point of the #before comment in join,
the formal parameters value and list have been set to
13 and a, respectively.
The situation looks like this:
The variables list and value, since they are local to the function join,
are shown in blue29.
After the step
n = NodeCreate(value,None),
marked with the comment #during,
a new node with the given value is created and bound to
the variable n. The situation changes to:
The next step sets the next pointer
of n to point to the given list, via the
statement NodeSetNext(n,list):
Finally, the value of n is returned and a is assigned this new value:
From the drawing, we can see that list a now includes the value 13 at the front of the list, as intended. Of course, at this point the variables local to join, shown in blue, are no longer in scope.
Adding values to the front of a list is known as prepending, so join30 prepends a value to a list. Instead of adding new values to the front, we can also add values to the back of the list. Such a procedure is termed appending, but we will restrict appending to lists proper. In other words, we will only append a list to an existing list:
def append(listA,listB): #listA length must be > 0
node = listA
while (NodeNext(node) != None):
node = NodeNext(node)
NodeSetNext(node,listB)
return
Here, we set the variable node to the beginning of listA.
Then, as long as node's next pointer points to a node
and not None31,
we set node to the next node in the list. We repeat this process
until we reach the last node in the list, whose next pointer does
indeed point to None. At this point, we drop out of the loop and set the
next pointer of this last node to listB32.
Generally, you can quickly tell if a function is destructive or
non-destructive by looking at the return value. If the function
implements the procedure pattern, it is likely destructive. If
it implements the function pattern, it is likely non-destructive33.
Here is an example:
>>> a = ListCreate(2,4,6)
>>> b = ListCreate(1,3,5)
>>> append(a,b)
[2, [4, [6, [1, [3, [5, None]]]]]]
>>> NodeNext(a)
[4, [6, [1, [3, [5, None]]]]]
If we wish to append a single value to a list, we must turn the single value into a list first:
>>> a = ListCreate(2,4,6)
>>> append(a,ListCreate(8))
[2, [4, [6, [8, None]]]]
The process of moving from one node to the next in a list is known as walking the list. Obviously, the longer the list, the longer it takes to walk it. Thus, appending to a list can take much longer than prepending to the same list, since prepending takes the same amount of time regardless of how long the list is. However, as you will learn in your data structures class, there is a much faster way to append to a list. If you cannot wait to find out this clever implementation of append, search for the phrase "linked list tail pointer" on the interwebs.
Once we are able to insert values into a list, we now need functions to look at the values in the list. Two common operations of a list are head, which retrieves the first item in the list, and tail, which returns a the list made up of the all the nodes in the original list except the first node:
def head(items):
return NodeValue(items)
def tail(items):
return NodeNext(items)
As you can see, these functions are wrappers to the underlying node functions, but are used frequently because they are smaller names with generally universally understood semantics. Often, wrappers to the functions that change node values and pointers are defined as well:
def setHead(items,value):
NodeSetValue(items,value)
return
def setTail(items,next):
NodeSetNext(items,next)
return
Here is an example:
>>> a = ListCreate(1,8,7,2)
>>> a
[1, [8, [7, [2, None]]]]
>>> head(a)
1
>>> tail(a)
[8, [7, [2, None]]]
>>> head(a)
1
>>> head(tail(a))
8
>>> setHead(tail(a),0)
>>> a
[1, [0, [7, [2, None]]]]
Note that taking the tail of a list does not alter the original list in any way34.
Finally, the last two operations we will discuss retrieve and set the values of nodes anywhere in the list. To specify which node is of interest, the caller of these functions will supply an index. This index will function exactly like the zero-based index of arrays; index zero will refer to the first value, index one will refer to the second value, and so on. The function ListIndex gets a value at the given index and ListSetIndex replaces the old value at the given index with the given value. Both of these functions will use a helper function named ListIndexNode, which will return the node at a given index35:
def ListIndexNode(items,index):
node = items
for i in range(0,index,1):
node = tail(node)
return node
def ListIndex(items,index):
node = ListIndexNode(items,index)
return head(node)
def ListSetIndex(items,index,value):
node = ListIndexNode(items,index)
NodeSetValue(node,value)
return
Note that the ListIndexNode function walks the list for ListIndex and ListSetIndex. Note further that this implementation does no error checking. What happens if the index is greater than or equal to the number of nodes in the list?
Rather than have you type all this code in, the above node and list operations are bundled into a single module, named List.py. You can get the module with this command:
wget troll.cs.ua.edu/cs150/book/List.py
on Linux and this command:
curl -O troll.cs.ua.edu/cs150/book/List.py
on a Mac. For Windows, browse to:
http://troll.cs.ua.edu/cs150/book/ms/List.py
and save the page.
These versions of List.py generates better error messages for ListIndex and ListSetIndex.
This idea of walking lists is an extremely important concept; so important that we will review the two "walks" that are commonly needed when manipulating lists. The first is walking to a certain location in list. Usually, one walks to index (covered previously) or to a value. Of course, walking to a value is simply the search pattern, implemented for lists. Here is a search function that works on lists:
def find(value,items):
spot = items;
while (spot != None and head(spot) != value):
spot = tail(spot)
return spot
Note that this version returns the node containing the value if the value is
found and None otherwise. Because of the possibility that a value
may not be in the list, we must add the condition spot != None to
the while loop test. If we didn't, then spot would eventually reach
None and the head operation on None
would generate an error. Sometimes, find would be written with a simpler
loop test, but complicated by a return from the body of the loop:
def find(value,items):
spot = items;
while (spot != None):
if (head(spot) == value): return spot
spot = tail(spot)
return None
These two implementations are semantically equivalent; the former is usually preferred stylistically, but the latter is likely more common.
The second walk one is likely to encounter is a walk to the next-to-the-last node in a list, which is useful in many situations: removing the last item, adding a value to the end of a list quickly (again, search for "linked list tail pointer" for details), and so on.
One must walk the list to get to the penultimate node. Here is one attempt; it keeps two pointers when walking the list, a leading pointer and a trailing pointer that stays one node behind the leader. The walk ends when the next pointer of the leading node becomes null. We call this pattern the trailing value pattern:
def getPenultimateNode(items):
trailing = None;
leading = items;
while (tail(leading) != None): #when to stop walking
trailing = leading
leading = tail(leading)
return trailing
If we walked until the lead node became null, then the trailing node would be the last node in the list, not the next-to-the-last node. However, checking the next pointer of a node is a dangerous proposition. What happens in the case of an empty list? How many nodes must be in a list for this function to work properly? Highlight the following line to see the answers:
The above approach can be simplified by checking if the trailing pointer's next pointer points to a node whose next pointer is null. Although the check is a bit more complicated, it obviates the need for the leading node:
def getPenultimateNode(items):
trailing = items;
while (tail(tail(trailing)) != None):
trailing = tail(trailing)
return trailing
The two versions of getPenultimateNode are similar, but not exactly equivalent. How many nodes must be in a list for the second version to work properly? Highlight the following line to see the answers:
An application of the trailing value pattern is to insert a value into an ordered list so that the list remains ordered36. For example, if the ordered list is:
[1,[3,[8,[13,[14,[17,None]]]]]]
^ ^
and we wish to insert 10 into the list, we must place the 10 between the
8 and the 13 so that the list values remain in increasing order (from left
to right). The ^ marks show where the trailing and leading pointers
should end up when the walk is finished. If the 10 is inserted between
the trailing and leading pointers, we end up with:
[1,[3,[8,[10,[13,[14,[17,None]]]]]]]
as desired. Using getPenultimateNode as a starting point, we have:
def insertInOrder(value,items):
trailing = None;
leading = items;
while (...): #when to stop walking
trailing = leading
leading = tail(leading)
# insert new value in between trailing and leading
...
return "done"
The ellipses mark where changes to the trailing value pattern need to be made. The first ellipsis (the while test) should force the walk to stop when the correct insertion point is found. Clearly, that is when leading value is greater than the value to be inserted. The second ellipsis concerns how the actual insertion is to be performed. We know we will need to:
All that's left then is to fill in the blanks. Here is the result:
def insertInOrder(value,items):
trailing = None;
leading = items;
while (head(leading) < value): #when to stop walking
trailing = leading
leading = tail(leading)
# insert new value in between trailing and leading
n = NodeCreate(value,None)
NodeSetNext(n,leading)
NodeSetNext(trailing,n)
return "done"
Note that we wanted to stop when the leading value is greater than the value to be inserted. Since we are working with a while loop, we need to reverse the logic so that the walk continues as long as the leading value is less than the new value37.
Testing our code with the above example yields:
>>> a = ListCreate(1,3,8,13,14,17)
>>> insertInOrder(10,a)
>>> a
[1,[3,[8,[10,[13,[14,[17,None]]]]]]]
It appears our code works correctly! Very exciting! Unfortunately, the excitement of getting code to work seduces both novice and experienced Computer Scientists alike into thinking the task is finished. Many times, including this case in particular, initial success only means the basic idea is correct, not that the code is correct, for all possible inputs. Indeed, for our implementation of insertInOrder, there are edge cases for which this code fails. An edge case is a set of inputs that force the code to do as much (or as little) work as possible. For insertInOrder, what inputs causes the function to do as much work as possible? The only place where a variable amount of work is performed is the loop that walks the list. In order to make the loop go as many times as possible, it is clear that we would need to insert a value at the very end of the list. Doing so yields:
>>> a = ListCreate(1,3,8,13,14,17)
>>> insertInOrder(20,a)
Traceback (most recent call last):
File "test.py", line 15, in <module>
print("inserting 18:",insertInOrder(20,a))
File "test.py", line 6, in insertInOrder
while (head(leading) < value): #when to stop walking
File "/home/lusth/l1/book/List.py", line 34, in head
return NodeValue(items)
File "/home/lusth/l1/book/List.py", line 12, in NodeValue
def NodeValue(n): return n[0]
TypeError: 'NoneType' object is not subscriptable
Likewise, making the loop do as little work as possible yields errors in the code. What inputs would cause the loop run very little or not at all? Obviously, if the value to be inserted is smaller than any value in the list, the loop does not need to be run. Less obviously, what if the given list is empty? Fixing insertInOrder to handle these edge cases is left as an exercise.
Lists can be processed with loops, much like arrays. In fact, there is a direct translation from an array loop to a list loop. Suppose you have the following array loop:
for i in range(0,len(items),1): # items is an array
# loop body here
...
The corresponding list loop would be:
spot = items # items is an list
while (spot != None):
# loop body here
...
spot = tail(spot)
In addition, the empty array [] is replaced by the empty list None
and the array reference items[i] is replaced by head(items).
Finally,
references to the append method of arrays is replaced by calls
to the join function. For example:
results.append(items[i])
is replaced by:
result = join(head(spot),results)
Let's look a some example translations. The counting pattern is very important for lists, since there is no built-in function like len for determining the length of a list. The array version of this pattern is:
count = 0
for i in range(0,len(items),1):
count += 1
Translating the code to work with arrays using the above substitutions yields:
count = 0
spot = items #typical list walking initialization
while (spot != None): #typical list walking condition
count += 1
spot = tail(spot) #typical list walking update
As with arrays, each "step" taken results in a counter being incremented. For another example, let's look at a filtered accumulation. First, the array pattern:
def sumEvens(items):
total = 0
for i in range(0,len(items),1):
if (items[i] % 2 == 0):
total += items[i]
return total
Now the list translation:
def sumEvens(items):
total = 0
spot = items
while (spot != None):
if (head(spot) % 2 == 0):
total += head(spot)
spot = tail(spot)
return total
Finally, let's look at the extreme index pattern, since it adds a slight wrinkle. Here is the array version:
ilargest = 0
for i in range(1,len(items),1):
if (items[i] > items[ilargest]):
ilargest = i
Note that we have no instructions for translating items[ilargest].
We can, however, save both the largest value seen so far as well as the current
index:
index = 0
spot = items
largest = head(spot)
ilargest = 0
while (spot != None):
if (head(spot) > largest):
ilargest = index
largest = head(spot)
index += 1
spot = tail(spot)
In general, any processing that requires finding an index will be more complicated with a list.
Filtering and mapping lists with loops is problematic, since using join instead of append ends up reversing the elements in the resulting list. This is because join puts an element at the beginning and thus the last element joined ends up at the front of the list and the next-to-the-last element ends up in the second position, and so on. Still, if the order of the result doesn't matter, then lists and loops go well together for this kind of processing. Here is an array loop that filters out the odd elements, leaving the even ones in the result:
def extractEvens(items):
evens = []
for i in range(0,len(items),1):
if (isEven(items[i])):
evens.append(items[i])
return evens
Translating this to list notation yields:
def extractEvens(items):
evens = None
spot = items
while (spot != None):
if (isEven(head(spot))):
evens = join(head(spot),evens)
spot = tail(spot)
return evens
Because of the reversal:
extractEvens(ListCreate(2,6,5,3,9,8,4))
returns:
[4,[8,[6,[2,None]]]]
As with arrays, shuffling and merging lists using loops is rather complicated. So much so that we will not even bother discussing loop-with-list versions. However, using recursion with lists greatly simplifies this type of processing, as we shall see in the next chapter.
The primary reason for accidentally implementing the fossilized pattern is to forget the while loop update. For example, while:
spot = items
while (spot != None):
# loop body here
...
spot = tail(spot)
may have been intended, what gets coded is:
spot = items
while (spot != None):
# loop body here
...
Since spot is never updated, it never becomes None (unless, of course, it was None to begin with). This leads to an infinite loop.
The wrong-spot pattern is another common error. Here instead of referring to spot, one mistakenly refers to items instead. An example is this attempt at finding the smallest value in a list:
smallest = head(items)
spot = items
while (spot != None):
if (head(spot) < smallest):
smallest = head(items) # ERROR: should use spot here
spot = tail(spot)
This loop never sets smallest to anything except the first value in items.
You may be asking, why go through all this trouble to implement lists when arrays seem to do every thing lists can do. The reason comes down to the fact that no data structure does all things well. For example, a data structure may be very fast when putting values in, but very slow in taking values out. Another kind of data structure may be very slow putting values in but very fast taking values out. This is why there are quite a number of different data structures, each with their own strengths.
In the case of lists and arrays, adding an item to a list can be much quicker than adding an item to an array. Consider four scenarios:
Scenarios 1, 2, and 3 all take about the same amount of time, whereas scenario 4 takes about a million times longer than the other scenarios. This is because when a value is joined to an array, a new array is created and all the elements in the original array are copied into the new array. If the old array has one value, then one value is copied. If the old array has a million values, then a million values are copied.
Another reason is that, in other programming languages besides Python, arrays have a fixed size and you cannot put more values in the array than there is room. A list, on the other hand, can grow without bound, subject to the memory available to the program. That said, sometimes arrays are the better data structure to use. Therefore, the List module contains two functions for converting between arrays and lists:
ArrayToList(items) # where items is an array; a list is returned
ListToArray(items) # where items is a list; an array is returned
Use these functions when the data is in one form, but you need it in another.
We will use lists heavily in the next chapter on recursion.
for i in args: #an alternate loop
items = join(i,items)
Explain what happens and why when more than one argument is passed to ListCreate.
(reverse (ListCreate 1 2 3 4)) should return the list
[4, [3, [2, [1, None]]]].
For Linux and Mac users, the code from this chapter can be found here.
For Windows users, look here (you will need to save the file as lists.py).
lusth@cs.ua.edu
![]() |
![]() |
![]() | Contents |