Loops Top StructuresInput and Loops Contents

Input and Loops

You can download the functions defined in this chapter with the following commands:

    wget troll.cs.ua.edu/ACP-C/moreInput.c
    wget troll.cs.ua.edu/ACP-C/moreInput.h

These files will help you run the test code listed in this chapter.

Converting command line arguments en mass

Now that we have learned how to loop, we can perform more sophisticated types of input. Suppose all the command-line arguments are integers that need to be converted from their string versions stored in sys.argv. We can use a loop and the accumulate pattern to accumulate the converted string elements:

    void
    convertArgsToNumbers(int argc,char **argv,int *numbers)
        {
        int i;
        // start at 1 to skip over program file name
        for (i = 1; i < argc; ++i)
            numbers[i-1] = atoi(argv[i]);
        }

Note the use of i-1 as the index for the numbers array. This makes sure the second element of argv corresponds to the first element of numbers, and so on. A careful analysis of this function yields the following insight: the function implements the map pattern. The function being mapped is atoi and we map over the argv array of strings.

We can test our conversion function:

    //test (compile with scanner.c and moreInput.c)
    #include "scanner.h"
    #include "moreInput.h"
    int i;
    int *nums;

    nums = allocate(sizeof(int) * (argc-1));
    convertArgsToNumbers(argc,argv,nums);

    printf("original: ");
    for (i = 0; i < argc; ++i)
        printf("[\"%s\"]",argv[i]);
    printf("\n");

    printf("converted: ");
    for (i = 0; i < argc-1; ++i) //one fewer item in nums
        printf("[%d]",nums[i]);
    printf("\n");

Note the use of the string directive in the first loop and the integer directive in the second. Let's look at our test code's behavior, assuming we place the code in convert.c:

   $ gcc -Wall -g -o convert convert.c scanner.c
   $ convert 1 34 -2
   original:  ["convert"]["1"]["34"]["-2"]
   converted: [1][34][-2]

We see that the elements of argv are indeed strings. Otherwise, the "%s" directive would not have worked.

Reading individual items from files

we can read a unknown number of items from a file by using functions found in the scanner module. When doing so, we always follow this pattern:

    open the file
    read the first item
    while the read was good
        process the item
        read the next item
    close the file

In C, we tell if the read was good by checking the the file pointer for the end-of-file condition. We do this with the feof function, which returns true if the read failed and false if the read succeeded. With that detail, the while loop is refined to:

    FILE *fp = fopen(fileName,"r");
    //read the first item
    ...
    while (!feof(fp))
        {
        //process the item
        ...
        //read the next item
        ...
        }

Processing files a line at a time

Here is a function that version reads and writes all the lines in a file, one line at a time. In addition, the function returns the number of lines processed. It makes use of the readLine function in the scanner module.

    int
    copyFile(char *inFile,char *outFile)
        {
        FILE *in = fopen(inFile,"r");       //check for failed open omitted
        FILE *out = fopen(outFile,"w");     //check for failed open omitted
        int count = 0;
        char *line;

        line = readLine(in);        //readLine is a scanner module function
        while (!feof(in))
            {
            ++count;
            fprintf(out,"%s\n",line);
            free(line);             //we're done with line, so free it
            line = readLine(in);
            }

        //always close your files
        fclose(in);
        fclose(out);

        return count;
        }

Notice we used the counting pattern, augmented by printing out the current line to the output file every time the count was incremented. Since the readLine function allocates space for the line it reads, we must free it when we are done with it. Note that you only need to free items read with the readLine, readToken, and readString functions. You should only free these items when you are absolutely done with them.

Using functions in the scanner module

Recall that the scanner module contains a number of useful function for reading input. To use these functions, you must first download the module (see the first section of this chapter). Then, you need tell the compiler what those functions look like before you call them. You do that by including scanner.h in any source code file that calls scanner functions:

    #include "scanner.h"

Finally, you need to compile the scanner source in with your main program:

    gcc -Wall -g mainProgram.c scanner.c

Here is a program that utilizes the copyFile function defined above:

    //test (compile with scanner.h and moreInput.h)
    #include "scanner.h"
    #include "moreInput.h" //contains prototype for copyFile

    int count;
        
    count = copyFile("copycat.c","junk.c");
    printf("%d lines copied.\n",count);

If this program is placed in a file copycat.c, compiled, and then run:

    $ gcc -Wall -g -o copycat cat.c scanner.c
    $ copycat

Then you should find a copy of copycat.c in the file junk.c.

Patterns for reading input

The same patterns that are commonly used for processing arrays can be used for reading input. For example, here is a function that implements a filtered count:

    int
    countSmallNumbers(char *fileName)
        {
        FILE *fp = fopen(fileName,"r");    //check for failed open omitted
        int count;
        int number;
        
        count = 0;
        number = readInt(fp);              //read the first integer
        while (!feof(fp))                  //check if the read was good
            {
            if (number < SIZE_LIMIT)       //smaller than SIZE_LIMIT, then small!
                ++count;
            number = readInt(fp);          //read the next integer
            }
        fclose(fp);                        //always close files when done
        return count;
        }

Note that the use of the standard reading pattern: opening the file, making the first read, testing if the read was good, processing the item read (by counting it), reading the next item, and finally closing the file after the loop terminates. Using the scanner functions always means performing the five steps as given in the comments.

Reading Items into an Array

Note that the countSmallNumbers function is doing two things, reading the tokens and also counting the number of short tokens. It is said that this function has two concerns, reading and counting. A fundamental principle of Computer Science is the separation of concerns. To separate the concerns, we have one function read the tokens, storing them into an array (reading and storing is considered to be a single concern). We then have another function count the small numbers in the array. Thus, we will have separated the two concerns into separate functions, each with its own concern.

Reading in an undetermined number of items from a file is problematic. We don't know how big to allocate our array that will hold the items that we read. We have three choices:

  1. allocate a really big array and stop reading if it fills up
  2. read the file twice, the first time to count, the second to store
  3. grow the array if it fills up

The first choice is a very poor one. The second works well, but is inefficient. Still, it is a viable strategy, so let's explore that approach.

Reading a file to size an array

For this example, we will count the number of short tokens in a file. To count tokens in a file, we modify the countSmallNumbers function above to read in tokens instead of integer and also removing the if that protects the increment of the count. Call this function countTokens.

    int
    countTokens(char *fileName)
        {
        FILE *fp = fopen(fileName,"r");    //check for failed open omitted
        int count;
        char *token;
        
        count = 0;
        token = readToken(fp);             //read the first integer
        while (!feof(fp))                  //check if the read was good
            {
            ++count;
            free(token);                   //we've counted it, so free it
            token = readToken(fp);         //read the next integer
            }
        fclose(fp);                        //always close files when done
        return count;
        }

Once we have counted the token, then we can free it. We call countTokens to get the number of tokens in the file:

    int count = countTokens(fileName);

Next, we allocate the array with exactly the right number of slots:

    char **tokens = malloc(sizeof(char *) * count);

Finally, we fill the array with a call to a procedure named fillArrayWithTokens:

    fillArrayWithTokens(fileName,tokens);

The fillArrayWithTokens function is left as an exercise for the reader.

Resizing an array as needed

More sophisticated C programs use the third choice. The scanner module contains a function that will help us do that, reallocate. The reallocate function is merely a wrapper for the realloc function, much in the same way allocate is a wrapper for malloc. What realloc and reallocate do, when given a previously allocated array and a new size, is to allocate a new array with the given size and copy over elements from the old array to the new array. These routines return the new array, so the old array is often reassigned with the return value. For example, to reallocate an integer array items with an original size of size to a a new size of newSize, we would do something like this:

    size = newSize;                                //reset the size
    items = reallocate(items,sizeof(int) * size);  //reset the array

The realloc and reallocate functions can also be used to shrink an array as well as grow it.

Here is the reading (and storing) function, which grows an array as needed using reallocate:

    char **
    readTokens(char *fileName,int *finalSize)
        {
        FILE *fp = fopen(fileName,"r");     //check for failed open omitted
        int size = 10;                      //initial size of destination array
        char *token;
        int count;

        //allocate the destination array
        //char ** is a pointer to an array of strings (tokens are strings)
        char **items = allocate(sizeof(char *) * size);

        count = 0;
        token = readToken(fp);
        while (!feof(fp))
            {
            if (count == size)              //array is full!
                {
                // grow the array by doubling its size
                size = size * 2;
                items = reallocate(items,sizeof(char *) * size);
                //now there is enough room
                }
            items[count] = token;           //DO NOT FREE THE TOKEN!
            ++count;
            token = readToken(fp);
            }
        fclose(fp);

        //shrink the array to 'count' number of elements
        items = reallocate(items,sizeof(char *) * count);

        //count holds the number of items, store it in *finalSize
        *finalSize = count;
        return items;
        }

The readTokens function is complicated by the fact that it needs to return two things, the array of tokens that were found in the file, plus the size of the array. Unfortunately, C only allows a function to return one thing. So, we have chosen to return the array and to update one of the arguments to the function, which we do on the second to the last line of the function. To use readTokens, we need to define a variable to hold the size of the array and a variable to point to that array:

    int size;
    char **tokenArray;

    tokenArray = readTokens(fileName,&size); //send the address of size
    //size now holds the number of elements in the tokenArray
    //tokenArray points to the array of tokens (each token is a string)

Another thing that is different about the readTokens function is we do not free the token after it has been read. Instead, we store it in the destination array and leave it to the caller of the function to free the tokens and the array itself when it is done with them.

Finally, note that readToken shrinks its array to its final size after filling and growing. This is because as much as 50% of the array might not be used31.

Next, we implement the function to perform the filtered counting. Instead of passing the file name, as before, we pass the array of tokens that were read, plus the size of that array:

    int
    countShortTokensInArray(char **items,int size)
        {
        int i,count;

        count = 0;
        for (i = 0; i < size; ++i)
            {
            if (strlen(items[i]) < SHORT_LIMIT)
                count += 1;
            }
        return count;
        }

The countTokens function is now simpler than the original function32. This makes it easier to fix any errors in the function since you can concentrate on the single concern implemented by that function.

We can test our two functions with the following code:

    //test (compile with scanner.c and moreInput.c)
    #include "scanner.h"
    #include "moreInput.h"

    int count;
    int size;
    char **tokens;

    count = countShortTokens("tester.c");
    printf("counting tokens directly: %d tokens\n",count);

    tokens = readTokens("tester.c",&size);
    count = countShortTokensInArray(tokens,size);
    printf("counting tokens in two stages: %d tokens\n",count);

Reading Records into an Array

Often, data in a file is organized as records, where a record is just a collection of consecutive and related tokens. Each token in a record is known as a field. Suppose every four tokens in a file comprises a record:

    "Amber Smith"       President   32   97000.05
    "Thad Jones"        Assistant   15   89000.42
    "Ellen Thompson"    Hacker       2  147000.99

Typically, we define a function to read one collection of tokens at a time. Here is a function that reads a single record:

    char **
    readRecord(FILE *fp)                // we pass the file pointer in
        {
        char *name,*title,*years,*salary;
        char **record;
        
        name = readString(fp);           //name is a string, not a token

        if (feof(fp)) { return 0; }      // no record, return null

        title = readToken(fp);
        years = readToken(fp);
        salary = readToken(fp);

        //make an empty record
        record = allocate(sizeof(char *) * 4);  //there are four fields

        //fill the record
        record[0] = name;
        record[1] = title;
        record[2] = years;
        record[3] = salary;

        return record;
        }

Note that we return either a record as an array or the null pointer (zero) if no record was read. Note that we store every field as a string, even though the last two fields are numbers. This is because arrays are homogeneous. In a later chapter when we learn about structures, we will be able to create records that store heterogeneous data, such as years as an integer and salary as a real number.

To total up all the salaries, for example, we can use an accumulation loop (assuming the salary data resides in a file represented by the string fileName).

We do so by repeatedly calling readRecord, remembering to convert the string representing the salary in the record to a real number using atof:

    double
    totalPay(char *fileName)
        {
        FILE *fp;
        char **record;
        double total;

        fp = fopen(fileName,"r");       //check for failure to open omitted
        total = 0;
        record = readRecord(fp);
        while (!feof(fp))               //see if the read was good
            {
            total += atof(record[3]);
            freeRecord(record);         //done with record, free it
            record = readRecord(fp);
            }
        fclose(fp);                     //always close file
        return total;
        }

Note that it is the job of the caller of readRecord to open the file, repeatedly send the file pointer to readRecord, and close the file pointer when done. As, before we always check if the read was good using feof, although we could equivalently test if readRecord returns zero.

Note also the use of a function named freeRecord. If we just return the space occupied by the record with free:

    free(record);

we will release the record array, but not the individual elements, which also need to be freed. So here is an implementation of freeRecord:

    void
    freeRecord(char **r)
        {
        //free the fields first
        free(r[0]); //name
        free(r[1]); //title
        free(r[2]); //years
        free(r[3]); //salary
        //now free the record array
        free(r);
        }

Both totalPay and freeRecord suffer from a stylistic flaw. It uses those magic numbers we read about in the chapter on assignment. It is not clear from the code that the field at index three is the salary. To make the code more readable, we can set up some constants in the global scope (so that they will be visible everywhere): A second issue for totalPay is that that the function has two concerns (reading and accumulating). We will fix the magic number problem first, by placing constants representing the indices of the various fields of a record at the top of the file (usually after the includes but before the function prototypes):

    #define NAME 0
    #define TITLE 1
    #define SERVICE 2
    #define SALARY 3

In C, we can create constants by using the #define preprocessor directive. Our accumulation loop now becomes:

    total = 0;
    record = readRecord(fp);
    while (!feof(fp))
        {
        total += atof(record[SALARY]);
        record = readRecord(fp);
        }

The body of freeRecord now becomes:

    free(r[NAME]);
    free(r[TITLE]);
    free(r[YEARS]);
    free(r[SALARY]);
    ...

We can also rewrite our readRecord function to use the field index constants:

    ...
    //fill the record
    result[NAME] = name;
    result[TITLE] = title;
    result[SERVICE] = service;
    result[SALARY] = salary;

Even if someone changes the constants to:

    #define NAME 3
    #define TITLE 2
    #define SERVICE 1
    #define SALARY 0

The code still works correctly. Now, however, the salary resides at index 0, but the accumulation loop is still accumulating the salary due to its use of the constant to access the salary.

Creating an Array of Records

We can separate the two concerns of the totalPay function by having one function read the records into an array and having another total up the salaries. An array of records is known as a table. Creating the table is just creating the record, but instead of storing tokens in a record array, we store records in a table array. Because token is an array of characters and characters have type char, a token has type char *. Every time we make an array to hold some type X, the type of the array is X *. A record, which is an array of tokens, has type char **. We simply add a star for each level of array. Since a table is an array of records, its type is char ***. If we wanted an array of tables, what would that array's type be?

Recall the readTokens function that stored tokens in a growing array. Here is a readTable function, which takes the same approach:

    char *** 
    readTable(char *fileName,int *finalSize)
        {
        FILE *fp;
        int count;
        int size = 10;                  //initial size of destination array
        char **record;
        char ***table;

        fp = fopen(fileName,"r");     //check for failed open omitted

        //allocate the destination array
        table = allocate(sizeof(char **) * size);

        count = 0;
        record = readRecord(fp);
        while (!feof(fp))
            {
            if (count == size)              //array is full!
                {
                // grow the array by doubling its size
                size = size * 2;
                table = reallocate(table,sizeof(char **) * size);
                //now there is enough room
                }
            table[count] = record;           //DO NOT FREE THE RECORD!
            ++count;
            record = readRecord(fp);
            }
        fclose(fp);

        //shrink the array to 'count' number of elements
        table = reallocate(table,sizeof(char **) * count);

        //count holds the number of items, store it in *finalSize
        *finalSize = count;
        return table;
        }

The accumulation function is straightforward, now that records have been stored in an array:

    double
    totalPay2(char ***table,int size)
        {
        int i;
        double total = 0;

        for (i = 0; i < size; ++i)
            {
            char **record = table[i];
            total += atof(record[SALARY]);
            }

        return total;
        }

We can simply this function by removing the temporary variable record:

    double
    totalPay2(char ***table,int size)
        {
        int i;
        int total = 0;

        for (i = 0; i < size; ++i)
            {
            total += atof(table[i][SALARY]);
            }
        return total;
        }

Since a table is just an array, so we can walk it, accumulate items in each record (as we just did with salary), filter it and so on.

When we are done with the table, we need to free it. As with freeRecord, we cannot just free the table array, we also need to free what the array holds. Fortunately, the table holds records and we have already written freeRecord:

    void
    freeTable(char ***table,int size)
        {
        int i;
        //free the records first
        for (i = 0; i < size; ++i)
            {
            freeRecord(table[i]);
            }
        //now free the table array
        free(table);
        }

Tying it all together into a program, and putting some employment records into a file named employees.dat gives us:

    //test (compile with scanner.c and moreInput.c)
    #include "scanner.h"
    #include "moreInput.h"

    int size;
    char ***table = readTable("employees.dat",&size);
    double total = totalPay2(table,size);

    printf("total salary: $%.2f\n",total);

If employees.dat just contains the records of employees Smith, Jones, and Thompson above, then the output from this program should be:

    total salary: $333001.46

lusth@cs.ua.edu


Loops Top StructuresInput and Loops Contents