Arrays Top FunctionsStrings Contents

Strings

As stated in a previous chapter, strings in C are really arrays of characters. For example, the two variables s1 and s2 are very similar:

    //test
    char *s1 = "hello";
    char s2[6] = { 'h', 'e', 'l', 'l', 'o', '\0' };
    printf("%s\n",s1);
    printf("%s\n",s2);

Both print statements produce the word hello. The difference is that s1 is a true pointer while s2 is a pseudopointer. We can tell this by the types of s1 and s2. The variable s1 has a type char *, which means it holds an address of a memory location that stores a character. The variable s2 has the type char[6]. Since s1 is a true pointer, it can point to any number of different characters. The program above points it to the first character in the literal hello but that can be updated and point to different locations as the program executes. On the other hand, s2 will always refer to (first of) the six characters that were allocated to it. While the characters might change, the location that s2 references will never change.

A strange thing in the code above is that s2 is an array of six characters while the counting the characters between the quotes in the initializer for s1, "hello", yields five. It appears the sixth slot of s2 (at index 5, remember your zero-based counting!) is initialized to a rather funny looking character '\0'. This character is sometimes pronounced backslash-zero or, more often, the null character or the null byte. In C, strings are terminated by the null character, so both s1 and s2 in our example point to arrays of six slots:

For s1, the C compiler adds on terminating null characters to literal strings such as "hello". When we build a string with an array ourselves, we are in charge of getting things right. We must be mindful not to forget to both make room for the null character and place it into the array.

Note: you can safely allocate a static array of characters, making sure there is room for the null character with initializations like:

    char s3[] = "hello";

When you do this, the C compiler will determine the length of the string and allocate space for all the characters in the string plus the null character.

Printing strings

When we wish to print a string, we use the %s format directive in the guide string:

    //test
    char *s = "The integer directive is %d";
    printf("%s\n",s);               //should print: The integer directive is %d

Often, we wish to build a guide string and use it in a print statement:

    //test
    int z = 5;
    char *s = "The result is %d\n";
    printf(s,z*z);                  //should print: The result is 25

From this example, we can see that printf (indeed, all functions) are quite flexible in terms of what kinds of data we can send it. For the guide string, we can send a literal string, a character pointer, or an array of characters. Of course, with the latter two, the array of characters has to be terminated with the null character.

One difference between literal strings and arrays of characters is that one can change a character in an array of characters using bracket notation. For literal strings, attempting to change a character will cause undefined behavior. For example:

    char *r = "rat";                    //r points to a literal string
    char s[] = { 'r', 'a', 't', '\0'};  //s pseudopoints to an array of char
    char *t = r;                        //t points to a literal string
    char *u = s;                        //u points to an array of char

    r[0] = 'c';     //BAD
    s[0] = 'c';     //OK
    t[0] = 'c';     //BAD
    u[0] = 'c';     //OK

From the code above, we can see it doesn't matter if we use the name of an array or a pointer to the same array; we are allowed to change the contents of the array. Likewise, we are not allowed to change the contents of a literal string, regardless of how we access it.

Determining string length

C has a built-in function for determining the length of a string, strlen. The strlen function returns the number of characters in the given string, not counting the null character. For example,

    //test
    char s[6] = { 'w', 'o', 'r', 'l', 'd', '\0' };
    char *t = "world";
    //calculate!
    printf("the length of s is %zu\n",strlen(s));
    printf("the length of t is %zu\n",strlen(t));

The output of the above code is:

    the length of s is 5
    the length of t is 5

There are actually six characters in s and t, so when using strlen to compute the length of the string, remember to add in one more to account for the null character.

A quick question for you regarding strelen. In the example above, what character does s[strlen(s)] reference? Is it the last character in the array (`d') or the null character? Since arrays start counting at zero, it is always important that you account for this when using strlen.

You likely noticed the use of the %zu format specifier in the guide string to look at the return value of strlen. Like sizeof, strlen returns an integer on a 32-bit machine and a long(er) integer on a 64-bit machine.

In order for strlen to work properly on character arrays, they must be terminated by the null character. Consider this code:

    char s[5] = { 'w', 'o', 'r', 'l', 'd' };
    printf("the length of s is %zu\n",strlen(s));

The result of running the strlen function in this case is undefined. This is because strlen will count characters until it encounters the null character. Since there is no null character specified, strlen will cruise through memory until it finds one accidentally. If it finds one, it will report an erroneous length, If it doesn't find one, your program will crash.

Copying strings

Sometimes it is useful to copy a string. The following code does not copy. Rather, it sets up an alias:

    //test
    char *s = "rat";
    char *t = s;                //t is an alias for s
    printf("%s\n",t);           //should print: rat

To make a copy, we need to allocate an array and then use the strncpy function to copy the string into the array. The strncpy function takes an array that receives the characters being copied, the string to be copied, and an integer that limits the number of characters to be copied. Here is an example:

    //test
    char *s = "rat";
    char t[4];
    //calculate!
    strncpy(t,s,4);         //magic number, s has four characters!
    t[0] = 'c';             //OK, t is an array
    printf("%s\n",s);       //should print rat
    printf("%s\n",t);       //should print cat

Here the output of the first print statement is rat, showing that changing the first character of t had no effect on s. The second print statement produces cat, indicating that the copy performed by strncpy worked.

One common mistake made with this kind of code is to specifiy one less character than necessary in specifying the number of characters to be copied, i.e., forgetting to count the null character. A better version uses strlen:

    strncpy(t,s,strlen(s)+1);   //add 1 for the null byte

A second common mistake is to allocate the destination array with too few slots. A safer version of this code copies as many characters as allowed, then writes a null byte in the last slot of array:

    strncpy(t,s,SLOTS(t));
    t[SLOTS(t)-1] = '\0';

Even this code is problematic, in that if t is too small, the string in t will be a prefix of the string in s:

    //test
    #define SLOTS(a) (sizeof(a)/sizeof(*a))
    char *s = "rat";
    char t[3];              //t is too small!
    //calculate!
    strncpy(t,s,SLOTS(t));
    t[SLOTS(t)-1] = '\0';
    printf("%s\n",s);       //should print rat
    printf("%s\n",t);       //should print ra

Instead of using a statically allocated array to hold a string, we can use a dynamically allocated array:

    //test
    char *s = "rat";
    char *t = malloc(strlen(s) + 1); //remember the null character!
    // check to see that malloc was successful
    if (t == 0)
        {
        fprintf(stderr,"allocation of t failed\n");
        exit(1);
        }
    //calculate!
    strncpy(t,s,strlen(s) + 1);
    t[0] = 'c';             //OK, t points to an array
    printf("%s\n",s);       //should print rat
    printf("%s\n",t);       //should print cat

The use of dynamic allocation solves all our problems!

One final note on copying strings. We used the strncpy routine and specifically watched how many characters were being copied. There is also a strcpy routine that simply copies a string without doing any checking to make sure you have the space necessary to make the copy. Since one of the easiest ways to get in trouble with C programming is to overwrite something, we recommend you always make sure you have room to store whatever it is that you are copying.

Comparing strings

Can one string be "less than" another, in the same way that one number can be less than another? It turns out the answer is yes. In fact, the ability to say one string is less than another allows us to quickly look up words in a dictionary20. In a dictionary, the word aardvark comes before zebra, because the letter a, which starts aardvark, comes before z, which starts zebra, in the alphabet.

If we were to ask:

    //test
    char *w1 = "zebra";
    char *w2 = "aardvark";
    printf("%d\n",w2 < w1);     //may print true: 1, but likely false: 0

the output may surprise us. This is because we did not ask whether w2 comes before w1 in the dictionary. Recall that w1 and w2 are pointers. Thus, the question that we asked was is the address held in w2 less than the address held in w1.

We have a special purpose operator/function for answering dictionary questions. It is called strcmp. When given two strings, strcmp returns 0 if they are composed of the same characters in the same order, a negative integer if the first string precedes the second in a dictionary, and a positive integer if the first string follows the second:

    //test
    char *w1 = "zebra";
    char *w2 = "aardvark";
    printf("%d\n",strcmp(w2,w1) < 0);       //should print true: 1

We have thrown around the term dictionary rather cavalierly. C's dictionary ordering is a bit different than what we are used to. For example, in C's alphabet, a capital Z comes before a lower-case a. Therefore, when we capitalize Zebra:

    //test
    char *w1 = "Zebra";
    char *w2 = "aardvark";
    printf("%d\n",strcmp(w2,w1) < 0);       //should print false: 0

we get the opposite result from before.

Several chapters ago, we mentioned an ASCII table for characters. This table is what C uses for its ordering. If you find this table online, you can easily see why a capital-Z (ASCII value 90) is considered less than a lowercase-a (ASCII value 97).

Converting strings to numbers

The function atoi can convert a string representing an integer to an actual integer:

    //test
    int i;
    char *s;
    //calculate!
    s = "123";
    i = atoi(s);
    printf("s+1 is %s\n",s+1);   //should print 23
    printf("i+1 is %d\n",i+1);   //should print 124

The first print statements prints 23 because s+1 skips over the first character. The second print 124 because atoi converts the string "123" into the integer 123. Adding one to the integer increments the value to 124. If you give atoi a string that doesn't look like an integer, it will return 0.

The atof function can be used to convert a string into a real number.

    double r = atof("3.14159");

Like atoi, if you give atof a string that does not look like a number, it will return 0.0;

lusth@cs.ua.edu


Arrays Top FunctionsStrings Contents