Input and Loops Contents

# Input and Loops

You can download the functions defined in this chapter with the following commands:

```    wget troll.cs.ua.edu/ACP-C/moreInput.c
wget troll.cs.ua.edu/ACP-C/moreInput.h
```

These files will help you run the test code listed in this chapter.

## Converting command line arguments en mass

Now that we have learned how to loop, we can perform more sophisticated types of input. Suppose all the command-line arguments are integers that need to be converted from their string versions stored in sys.argv. We can use a loop and the accumulate pattern to accumulate the converted string elements:

```    void
convertArgsToNumbers(int argc,char **argv,int *numbers)
{
int i;
// start at 1 to skip over program file name
for (i = 1; i < argc; ++i)
numbers[i-1] = atoi(argv[i]);
}
```

Note the use of `i-1` as the index for the numbers array. This makes sure the second element of argv corresponds to the first element of numbers, and so on. A careful analysis of this function yields the following insight: the function implements the map pattern. The function being mapped is atoi and we map over the argv array of strings.

We can test our conversion function:

```    //test (compile with scanner.c and moreInput.c)
#include "scanner.h"
#include "moreInput.h"
int i;
int *nums;

nums = allocate(sizeof(int) * (argc-1));
convertArgsToNumbers(argc,argv,nums);

printf("original: ");
for (i = 0; i < argc; ++i)
printf("[\"%s\"]",argv[i]);
printf("\n");

printf("converted: ");
for (i = 0; i < argc-1; ++i) //one fewer item in nums
printf("[%d]",nums[i]);
printf("\n");
```

Note the use of the string directive in the first loop and the integer directive in the second. Let's look at our test code's behavior, assuming we place the code in convert.c:

```   \$ gcc -Wall -g -o convert convert.c scanner.c
\$ convert 1 34 -2
original:  ["convert"]["1"]["34"]["-2"]
converted: [1][34][-2]
```

We see that the elements of argv are indeed strings. Otherwise, the `"%s"` directive would not have worked.

## Reading individual items from files

we can read a unknown number of items from a file by using functions found in the scanner module. When doing so, we always follow this pattern:

```    open the file
process the item
close the file
```

In C, we tell if the read was good by checking the the file pointer for the end-of-file condition. We do this with the feof function, which returns true if the read failed and false if the read succeeded. With that detail, the while loop is refined to:

```    FILE *fp = fopen(fileName,"r");
...
while (!feof(fp))
{
//process the item
...
...
}
```

## Processing files a line at a time

Here is a function that version reads and writes all the lines in a file, one line at a time. In addition, the function returns the number of lines processed. It makes use of the readLine function in the scanner module.

```    int
copyFile(char *inFile,char *outFile)
{
FILE *in = fopen(inFile,"r");       //check for failed open omitted
FILE *out = fopen(outFile,"w");     //check for failed open omitted
int count = 0;
char *line;

while (!feof(in))
{
++count;
fprintf(out,"%s\n",line);
free(line);             //we're done with line, so free it
}

fclose(in);
fclose(out);

return count;
}
```

Notice we used the counting pattern, augmented by printing out the current line to the output file every time the count was incremented. Since the readLine function allocates space for the line it reads, we must free it when we are done with it. Note that you only need to free items read with the readLine, readToken, and readString functions. You should only free these items when you are absolutely done with them.

## Using functions in the scanner module

Recall that the scanner module contains a number of useful function for reading input. To use these functions, you must first download the module (see the first section of this chapter). Then, you need tell the compiler what those functions look like before you call them. You do that by including scanner.h in any source code file that calls scanner functions:

```    #include "scanner.h"
```

Finally, you need to compile the scanner source in with your main program:

```    gcc -Wall -g mainProgram.c scanner.c
```

Here is a program that utilizes the copyFile function defined above:

```    //test (compile with scanner.h and moreInput.h)
#include "scanner.h"
#include "moreInput.h" //contains prototype for copyFile

int count;

count = copyFile("copycat.c","junk.c");
printf("%d lines copied.\n",count);
```

If this program is placed in a file copycat.c, compiled, and then run:

```    \$ gcc -Wall -g -o copycat cat.c scanner.c
\$ copycat
```

Then you should find a copy of copycat.c in the file junk.c.

The same patterns that are commonly used for processing arrays can be used for reading input. For example, here is a function that implements a filtered count:

```    int
countSmallNumbers(char *fileName)
{
FILE *fp = fopen(fileName,"r");    //check for failed open omitted
int count;
int number;

count = 0;
while (!feof(fp))                  //check if the read was good
{
if (number < SIZE_LIMIT)       //smaller than SIZE_LIMIT, then small!
++count;
}
fclose(fp);                        //always close files when done
return count;
}
```

Note that the use of the standard reading pattern: opening the file, making the first read, testing if the read was good, processing the item read (by counting it), reading the next item, and finally closing the file after the loop terminates. Using the scanner functions always means performing the five steps as given in the comments.

## Reading Items into an Array

Note that the countSmallNumbers function is doing two things, reading the tokens and also counting the number of short tokens. It is said that this function has two concerns, reading and counting. A fundamental principle of Computer Science is the separation of concerns. To separate the concerns, we have one function read the tokens, storing them into an array (reading and storing is considered to be a single concern). We then have another function count the small numbers in the array. Thus, we will have separated the two concerns into separate functions, each with its own concern.

Reading in an undetermined number of items from a file is problematic. We don't know how big to allocate our array that will hold the items that we read. We have three choices:

1. allocate a really big array and stop reading if it fills up
2. read the file twice, the first time to count, the second to store
3. grow the array if it fills up

The first choice is a very poor one. The second works well, but is inefficient. Still, it is a viable strategy, so let's explore that approach.

### Reading a file to size an array

For this example, we will count the number of short tokens in a file. To count tokens in a file, we modify the countSmallNumbers function above to read in tokens instead of integer and also removing the if that protects the increment of the count. Call this function countTokens.

```    int
countTokens(char *fileName)
{
FILE *fp = fopen(fileName,"r");    //check for failed open omitted
int count;
char *token;

count = 0;
while (!feof(fp))                  //check if the read was good
{
++count;
free(token);                   //we've counted it, so free it
}
fclose(fp);                        //always close files when done
return count;
}
```

Once we have counted the token, then we can free it. We call countTokens to get the number of tokens in the file:

```    int count = countTokens(fileName);
```

Next, we allocate the array with exactly the right number of slots:

```    char **tokens = malloc(sizeof(char *) * count);
```

Finally, we fill the array with a call to a procedure named fillArrayWithTokens:

```    fillArrayWithTokens(fileName,tokens);
```

The fillArrayWithTokens function is left as an exercise for the reader.

### Resizing an array as needed

More sophisticated C programs use the third choice. The scanner module contains a function that will help us do that, reallocate. The reallocate function is merely a wrapper for the realloc function, much in the same way allocate is a wrapper for malloc. What realloc and reallocate do, when given a previously allocated array and a new size, is to allocate a new array with the given size and copy over elements from the old array to the new array. These routines return the new array, so the old array is often reassigned with the return value. For example, to reallocate an integer array items with an original size of size to a a new size of newSize, we would do something like this:

```    size = newSize;                                //reset the size
items = reallocate(items,sizeof(int) * size);  //reset the array
```

The realloc and reallocate functions can also be used to shrink an array as well as grow it.

Here is the reading (and storing) function, which grows an array as needed using reallocate:

```    char **
{
FILE *fp = fopen(fileName,"r");     //check for failed open omitted
int size = 10;                      //initial size of destination array
char *token;
int count;

//allocate the destination array
//char ** is a pointer to an array of strings (tokens are strings)
char **items = allocate(sizeof(char *) * size);

count = 0;
while (!feof(fp))
{
if (count == size)              //array is full!
{
// grow the array by doubling its size
size = size * 2;
items = reallocate(items,sizeof(char *) * size);
//now there is enough room
}
items[count] = token;           //DO NOT FREE THE TOKEN!
++count;
}
fclose(fp);

//shrink the array to 'count' number of elements
items = reallocate(items,sizeof(char *) * count);

//count holds the number of items, store it in *finalSize
*finalSize = count;
return items;
}
```

The readTokens function is complicated by the fact that it needs to return two things, the array of tokens that were found in the file, plus the size of the array. Unfortunately, C only allows a function to return one thing. So, we have chosen to return the array and to update one of the arguments to the function, which we do on the second to the last line of the function. To use readTokens, we need to define a variable to hold the size of the array and a variable to point to that array:

```    int size;
char **tokenArray;

//size now holds the number of elements in the tokenArray
//tokenArray points to the array of tokens (each token is a string)
```

Another thing that is different about the readTokens function is we do not free the token after it has been read. Instead, we store it in the destination array and leave it to the caller of the function to free the tokens and the array itself when it is done with them.

Finally, note that readToken shrinks its array to its final size after filling and growing. This is because as much as 50% of the array might not be used31.

Next, we implement the function to perform the filtered counting. Instead of passing the file name, as before, we pass the array of tokens that were read, plus the size of that array:

```    int
countShortTokensInArray(char **items,int size)
{
int i,count;

count = 0;
for (i = 0; i < size; ++i)
{
if (strlen(items[i]) < SHORT_LIMIT)
count += 1;
}
return count;
}
```

The countTokens function is now simpler than the original function32. This makes it easier to fix any errors in the function since you can concentrate on the single concern implemented by that function.

We can test our two functions with the following code:

```    //test (compile with scanner.c and moreInput.c)
#include "scanner.h"
#include "moreInput.h"

int count;
int size;
char **tokens;

count = countShortTokens("tester.c");
printf("counting tokens directly: %d tokens\n",count);

count = countShortTokensInArray(tokens,size);
printf("counting tokens in two stages: %d tokens\n",count);
```

## Reading Records into an Array

Often, data in a file is organized as records, where a record is just a collection of consecutive and related tokens. Each token in a record is known as a field. Suppose every four tokens in a file comprises a record:

```    "Amber Smith"       President   32   97000.05
"Ellen Thompson"    Hacker       2  147000.99
```

Typically, we define a function to read one collection of tokens at a time. Here is a function that reads a single record:

```    char **
readRecord(FILE *fp)                // we pass the file pointer in
{
char *name,*title,*years,*salary;
char **record;

name = readString(fp);           //name is a string, not a token

if (feof(fp)) { return 0; }      // no record, return null

//make an empty record
record = allocate(sizeof(char *) * 4);  //there are four fields

//fill the record
record[0] = name;
record[1] = title;
record[2] = years;
record[3] = salary;

return record;
}
```

Note that we return either a record as an array or the null pointer (zero) if no record was read. Note that we store every field as a string, even though the last two fields are numbers. This is because arrays are homogeneous. In a later chapter when we learn about structures, we will be able to create records that store heterogeneous data, such as years as an integer and salary as a real number.

To total up all the salaries, for example, we can use an accumulation loop (assuming the salary data resides in a file represented by the string fileName).

We do so by repeatedly calling readRecord, remembering to convert the string representing the salary in the record to a real number using atof:

```    double
totalPay(char *fileName)
{
FILE *fp;
char **record;
double total;

fp = fopen(fileName,"r");       //check for failure to open omitted
total = 0;
while (!feof(fp))               //see if the read was good
{
total += atof(record[3]);
freeRecord(record);         //done with record, free it
}
fclose(fp);                     //always close file
}
```

Note that it is the job of the caller of readRecord to open the file, repeatedly send the file pointer to readRecord, and close the file pointer when done. As, before we always check if the read was good using feof, although we could equivalently test if readRecord returns zero.

Note also the use of a function named freeRecord. If we just return the space occupied by the record with free:

```    free(record);
```

we will release the record array, but not the individual elements, which also need to be freed. So here is an implementation of freeRecord:

```    void
freeRecord(char **r)
{
//free the fields first
free(r[0]); //name
free(r[1]); //title
free(r[2]); //years
free(r[3]); //salary
//now free the record array
free(r);
}
```

Both totalPay and freeRecord suffer from a stylistic flaw. It uses those magic numbers we read about in the chapter on assignment. It is not clear from the code that the field at index three is the salary. To make the code more readable, we can set up some constants in the global scope (so that they will be visible everywhere): A second issue for totalPay is that that the function has two concerns (reading and accumulating). We will fix the magic number problem first, by placing constants representing the indices of the various fields of a record at the top of the file (usually after the includes but before the function prototypes):

```    #define NAME 0
#define TITLE 1
#define SERVICE 2
#define SALARY 3
```

In C, we can create constants by using the `#define` preprocessor directive. Our accumulation loop now becomes:

```    total = 0;
while (!feof(fp))
{
total += atof(record[SALARY]);
}
```

The body of freeRecord now becomes:

```    free(r[NAME]);
free(r[TITLE]);
free(r[YEARS]);
free(r[SALARY]);
...
```

We can also rewrite our readRecord function to use the field index constants:

```    ...
//fill the record
result[NAME] = name;
result[TITLE] = title;
result[SERVICE] = service;
result[SALARY] = salary;
```

Even if someone changes the constants to:

```    #define NAME 3
#define TITLE 2
#define SERVICE 1
#define SALARY 0
```

The code still works correctly. Now, however, the salary resides at index 0, but the accumulation loop is still accumulating the salary due to its use of the constant to access the salary.

## Creating an Array of Records

We can separate the two concerns of the totalPay function by having one function read the records into an array and having another total up the salaries. An array of records is known as a table. Creating the table is just creating the record, but instead of storing tokens in a record array, we store records in a table array. Because token is an array of characters and characters have type `char`, a token has type `char *`. Every time we make an array to hold some type X, the type of the array is X `*`. A record, which is an array of tokens, has type `char **`. We simply add a star for each level of array. Since a table is an array of records, its type is `char ***`. If we wanted an array of tables, what would that array's type be?

Recall the readTokens function that stored tokens in a growing array. Here is a readTable function, which takes the same approach:

```    char ***
{
FILE *fp;
int count;
int size = 10;                  //initial size of destination array
char **record;
char ***table;

fp = fopen(fileName,"r");     //check for failed open omitted

//allocate the destination array
table = allocate(sizeof(char **) * size);

count = 0;
while (!feof(fp))
{
if (count == size)              //array is full!
{
// grow the array by doubling its size
size = size * 2;
table = reallocate(table,sizeof(char **) * size);
//now there is enough room
}
table[count] = record;           //DO NOT FREE THE RECORD!
++count;
}
fclose(fp);

//shrink the array to 'count' number of elements
table = reallocate(table,sizeof(char **) * count);

//count holds the number of items, store it in *finalSize
*finalSize = count;
return table;
}
```

The accumulation function is straightforward, now that records have been stored in an array:

```    double
totalPay2(char ***table,int size)
{
int i;
double total = 0;

for (i = 0; i < size; ++i)
{
char **record = table[i];
total += atof(record[SALARY]);
}

}
```

We can simply this function by removing the temporary variable record:

```    double
totalPay2(char ***table,int size)
{
int i;
int total = 0;

for (i = 0; i < size; ++i)
{
total += atof(table[i][SALARY]);
}
}
```

Since a table is just an array, so we can walk it, accumulate items in each record (as we just did with salary), filter it and so on.

When we are done with the table, we need to free it. As with freeRecord, we cannot just free the table array, we also need to free what the array holds. Fortunately, the table holds records and we have already written freeRecord:

```    void
freeTable(char ***table,int size)
{
int i;
//free the records first
for (i = 0; i < size; ++i)
{
freeRecord(table[i]);
}
//now free the table array
free(table);
}
```

Tying it all together into a program, and putting some employment records into a file named employees.dat gives us:

```    //test (compile with scanner.c and moreInput.c)
#include "scanner.h"
#include "moreInput.h"

int size;
double total = totalPay2(table,size);

printf("total salary: \$%.2f\n",total);
```

If employees.dat just contains the records of employees Smith, Jones, and Thompson above, then the output from this program should be:

```    total salary: \$333001.46
```

lusth@cs.ua.edu

 Input and Loops Contents