Consolidated lecture notes for CS223 as taught in the Spring 2005 semester at Yale by Jim Aspnes.

Contents

  1. C/Variables
  2. Machine memory
  3. Variables
    1. Variable declarations
    2. Variable names
  4. Using variables
  5. C/IntegerTypes
  6. Integer types
    1. C99 fixed-width types
  7. Integer constants
  8. Integer operators
    1. Arithmetic operators
    2. Bitwise operators
    3. Logical operators
    4. Relational operators
  9. Input and output
  10. Alignment
  11. C/Statements
  12. Simple statements
  13. Compound statements
    1. Conditionals
    2. Loops
      1. The while loop
      2. The do..while loop
      3. The for loop
      4. Loops with break, continue, and goto
    3. Choosing where to put a loop exit
  14. C/AssemblyLanguage
  15. C/Functions
  16. Function definitions
  17. Calling a function
  18. The return statement
  19. Function declarations and modules
  20. Static functions
  21. Local variables
  22. Mechanics of function calls
  23. C/InputOutput
  24. Character streams
  25. Reading and writing single characters
  26. Formatted I/O
  27. Rolling your own I/O routines
  28. File I/O
  29. C/Pointers
  30. Memory and addresses
  31. Pointer variables
    1. Declaring a pointer variable
    2. Assigning to pointer variables
    3. Using a pointer
    4. Printing pointers
  32. The null pointer
  33. Pointers and functions
  34. Pointer arithmetic and arrays
    1. Arrays and functions
    2. Multidimensional arrays
    3. Variable-length arrays
  35. Void pointers
  36. Run-time storage allocation
  37. The restrict keyword
  38. C/DynamicStorageAllocation
  39. Dynamic storage allocation in C
  40. Overview
  41. Allocating space with malloc
    1. Testing the return value of malloc
  42. Freeing space with free
    1. Storage allocation discipline
    2. Perils of buffer overflows
    3. The lingering curse of referencing freed blocks
  43. Dynamic arrays using realloc
  44. Example of building a data structure with malloc
      1. malloc2d.c
      2. Makefile
      3. Output of make test
  45. C/Strings
  46. String processing in general
  47. C strings
  48. String constants
  49. String buffers
  50. Operations on strings
  51. Finding the length of a string
    1. The strlen tarpit
  52. Comparing strings
  53. Formatted output to strings
  54. Dynamic allocation of strings
  55. argc and argv
  56. C/Structs
  57. Structs
  58. Unions
  59. Bit fields
  60. AbstractDataTypes
  61. Abstraction
  62. Example of an abstract data type
    1. Interface
      1. sequence.h
    2. Implementation
      1. sequence.c
    3. Compiling and linking
      1. main.c
      2. Makefile
  63. Designing abstract data types
    1. Parnas's Principle
    2. When to build an abstract data type
  64. C/Definitions
  65. Naming types
  66. Naming constants
  67. Naming values in sequences
  68. Other uses of #define
  69. AsymptoticNotation
  70. Definitions
  71. Motivating the definitions
  72. Proving asymptotic bounds
  73. Asymptotic notation hints
    1. Remember the difference between big-O, big-Ω, and big-Θ
    2. Simplify your asymptotic terms as much as possible
    3. Remember the limit trick
  74. Variations in notation
    1. Absolute values
    2. Abusing the equals sign
  75. More information
  76. LinkedLists
  77. Stacks and linked lists
    1. Implementation
    2. A more complicated implementation
    3. Building a stack out of an array
  78. Looping over a linked list
  79. Looping over a linked list backwards
  80. Queues
  81. Deques and doubly-linked lists
  82. Circular linked lists
  83. What linked lists are and are not good for
  84. Further reading
  85. C/Debugging
  86. Debugging in general
  87. Assertions
  88. gdb
    1. My favorite gdb commands
    2. Debugging strategies
  89. Valgrind
    1. Compilation flags
    2. Automated testing
    3. Examples of some common valgrind errors
      1. Uninitialized values
      2. Bytes definitely lost
      3. Invalid write or read operations
  90. Not recommended: debugging output
  91. TestingDuringDevelopment
  92. Setting up the test harness
    1. Module interface
      1. stack.h
    2. Test code
      1. test-stack.c
    3. Makefile
      1. Makefile
  93. Stub implementation
    1. stack.c
  94. Bounded-space implementation
    1. stack.c
  95. First fix
  96. Final version
    1. stack.c
  97. Moral
  98. Appendix: Test macros
    1. tester.h
    2. tester.c
  99. LinkedLists
  100. Stacks and linked lists
    1. Implementation
    2. A more complicated implementation
    3. Building a stack out of an array
  101. Looping over a linked list
  102. Looping over a linked list backwards
  103. Queues
  104. Deques and doubly-linked lists
  105. Circular linked lists
  106. What linked lists are and are not good for
  107. Further reading
  108. HashTables
  109. Hashing: basics
  110. Universal hash families
  111. FKS hashing with O(s) space and O(1) worst-case search time
  112. Cuckoo hashing
    1. Structure
  113. Bloom filters
  114. C/FunctionPointers
  115. Basics
  116. Function pointer declarations
  117. Applications
    1. Callbacks
    2. Dispatch tables
    3. Iterators
  118. Closures
  119. Objects
  120. C/PerformanceTuning
  121. Timing under Linux
  122. Profiling with gprof
  123. AlgorithmDesignTechniques
  124. Basic principles of algorithm design
  125. Classifying algorithms by structure
  126. Example: Finding the maximum
  127. Example: Sorting
  128. DynamicProgramming
  129. Memoization
  130. Dynamic programming
    1. More examples
      1. Longest increasing subsequence
      2. All-pairs shortest paths
      3. Longest common subsequence
  131. Dynamic programming: algorithmic perspective
    1. Preserving alternatives
    2. Knapsack
    3. Non-linear structures
      1. Trees
      2. Treewidth
        1. Examples
        2. Treewidth and dynamic programming
  132. BinaryTrees
  133. Tree basics
  134. Binary tree implementations
  135. The canonical binary tree algorithm
  136. Nodes vs leaves
  137. Special classes of binary trees
  138. Heaps
  139. Priority queues
  140. Expensive implementations of priority queues
  141. Heaps
  142. Packed heaps
  143. Bottom-up heapification
  144. Heapsort
  145. More information
  146. BinarySearchTrees
  147. Searching for a node
  148. Inserting a new node
  149. Costs
  150. BalancedTrees
  151. The basics: tree rotations
  152. AVL trees
  153. 2–3 trees
  154. Red-black trees
  155. B-trees
  156. Splay trees
  157. Skip lists
  158. Implementations
  159. RadixSearch
  160. Tries
    1. Searching a trie
    2. Inserting a new element into a trie
    3. Implementation
  161. Patricia trees
  162. Ternary search trees
  163. More information
  164. C/BitExtraction
  165. C/Graphs
  166. Graphs
  167. Why graphs are useful
  168. Operations on graphs
  169. Representations of graphs
    1. Adjacency matrices
    2. Adjacency lists
      1. An implementation
    3. Implicit representations
  170. Searching for paths in a graph
    1. Depth-first and breadth-first search
    2. Other variations on the basic algorithm
  171. DepthFirstSearch
  172. Depth-first search in a tree
  173. Depth-first search in a directed graph
  174. DFS forests
  175. Running time
  176. A more complicated application: strongly-connected components
  177. BreadthFirstSearch
  178. ShortestPath
  179. Single-source shortest paths
    1. Relaxation
    2. Dijkstra's algorithm
    3. Bellman-Ford
  180. All-pairs shortest paths
  181. Implementations
  182. C/FloatingPoint
  183. Floating point basics
  184. Floating-point constants
  185. Operators
  186. Conversion to and from integer types
  187. The IEEE-754 floating-point standard
  188. Error
  189. Reading and writing floating-point numbers
  190. Non-finite numbers in C
  191. The math library
  192. C/Persistence
  193. Persistent data
  194. A simple solution using text files
  195. Using a binary file
  196. A version that updates the file in place
  197. An even better version using mmap
  198. Concurrency and fault-tolerance issues: ACIDitiy
  199. C/WhatNext
  200. What's wrong with C
  201. What C++ fixes
  202. Other C-like languages to consider
  203. Scripting languages

1. C/Variables

2. Machine memory

Basic model: machine memory consists of many bytes of storage, each of which has an address which is itself a sequence of bits. Though the actual memory architecture of a modern computer is complex, from the point of view of a C program we can think of as simply a large address space that the CPU can store things in (and load things from), provided it can supply an address to the memory. Because we don't want to have to type long strings of bits all the time, the C compiler lets us give names to particular regions of the address space, and will even find free space for us to use.

3. Variables

A variable is a name given in a program for some region of memory. Each variable has a type, which tells the compiler how big the region of memory corresponding to it is and how to treat the bits stored in that region when performing various kinds of operations (e.g. integer variables are added together by very different circuitry than floating-point variables, even though both represent numbers as bits). In modern programming languages, a variable also has a scope (a limit on where the name is meaningful, which allows the same name to be used for different variables in different parts of the program) and an extent (the duration of the variable's existence, controlling when the program allocates and deallocates space for it).

3.1. Variable declarations

Before you can use a variable in C, you must declare it. Variable declarations show up in three places:

  • Outside a function. These declarations declare global variables that are visible throughout the program (i.e. they have global scope). Use of global variables is almost always a mistake.

  • In the argument list in the header of a function. These variables are parameters to the function. They are only visible inside the function body (local scope), exist only from when the function is called to when the function returns (bounded extent—note that this is different from what happens in some garbage-collected languages like Scheme), and get their initial values from the arguments to the function when it is called.

  • At the start of any block delimited by curly braces. Such variables are visible only within the block (local scope again) and exist only when the containing function is active (bounded extent). The convention in C is has generally been to declare all such local variables at the top of a function; this is different from the convention in C++ or Java, which encourage variables to be declared when they are first used. This convention may be less strong in C99 code, since C99 adopts the C++ rule of allowing variables to be declared anywhere (which can be particularly useful for index variables in for loops).

Variable declarations consist of a type name followed by one or more variable names separated by commas and terminated by a semicolon (except in argument lists, where each declaration is terminated by a comma). I personally find it easiest to declare variables one per line, to simplify documenting them. It is also possible for global and local variables (but not function arguments) to assign an initial value to a variable by putting in something like = 0 after the variable name. It is good practice to put a comment after each variable declaration that explains what the variable does (with a possible exception for conventionally-named loop variables like i or j in short functions). Below is an example of a program with some variable declarations in it:

   1 #include <stdio.h>
   2 #include <ctype.h>
   3 
   4 /* This program counts the number of digits in its input. */
   5 
   6 /*
   7  *This global variable is not used; it is here only to demonstrate
   8  * what a global variable declaration looks like.
   9  */
  10 unsigned long SpuriousGlobalVariable = 127;
  11 
  12 int
  13 main(int argc, char **argv)
  14 {
  15     int c;              /* character read */
  16     int count = 0;      /* number of digits found */
  17 
  18     while((c = getchar()) != EOF) {
  19         if(isdigit(c)) {
  20             count++;
  21         }
  22     }
  23 
  24     printf("%d\n", count);
  25 
  26     return 0;
  27 }
variables.c

3.2. Variable names

The evolution of variable names in different programming languages:

11101001001001
Physical addresses represented as bits.
#FC27
Typical assembly language address represented in hexadecimal to save typing (and because it's easier for humans to distinguish #A7 from #B6 than to distinguish 10100111 from 10110110.)
A1$
A string variable in BASIC, back in the old days where BASIC variables were one uppercase letter, optionally followed by a number, optionally followed by $ for a string variable and % for an integer variable. These type tags were used because BASIC interpreters didn't have a mechanism for declaring variable types.
IFNXG7

A typical FORTRAN variable name, back in the days of 6-character all-caps variable names. The I at the start means it's an integer variable. The rest of the letters probably abbreviate some much longer description of what the variable means. The default type based on the first letter was used because FORTRAN programmers were lazy, but it could be overridden by an explicit declaration.

i, j, c, count, top_of_stack, accumulatedTimeInFlight

Typical names from modern C programs. There is no type information contained in the name; the type is specified in the declaration and remembered by the compiler elsewhere. Note that there are two different conventions for representing multi-word names: the first is to replace spaces with underscores, and the second is to capitalize the first letter of each word (possibly excluding the first letter), a style called "camel case" (CamelCase). You should pick one of these two conventions and stick to it.

prgcGradeDatabase

An example of Hungarian notation, a style of variable naming in which the type of the variable is encoded in the first few character. The type is now back in the variable name again. This is not enforced by the compiler: even though iNumberOfStudents is supposed to be an int, there is nothing to prevent you from declaring float iNumberOfStudents if you expect to have fractional students for some reason. See http://web.umr.edu/~cpp/common/hungarian.html or HungarianNotation. Not clearly an improvement on standard naming conventions, but it is popular in some programming shops.

In C, variable names are called identifiers.1 An identifier in C must start with a lower or uppercase letter or the underscore character _. Typically variables starting with underscores are used internally by system libraries, so it's dangerous to name your own variables this way. Subsequent characters in an identifier can be letters, digits, or underscores. So for example a, ____a___a_a_11727_a, AlbertEinstein, aAaAaAaAaAAAAAa, and ______ are all legal identifiers in C, but $foo and 01 are not.

The basic principle of variable naming is that a variable name is a substitute for the programmer's memory. It is generally best to give identifiers names that are easy to read and describe what the variable is used for, i.e., that are self-documenting. None of the variable names in the preceding list are any good by this standard. Better names would be total_input_characters, dialedWrongNumber, or stepsRemaining. Non-descriptive single-character names are acceptable for certain conventional uses, such as the use of i and j for loop iteration variables, or c for an input character. Such names should only be used when the scope of the variable is small, so that it's easy to see all the places where it is used at once.

C identifiers are case-sensitive, so aardvark, AArDvARK, and AARDVARK are all different variables. Because it is hard to remember how you capitalized something before, it is important to pick a standard convention and stick to it. The traditional convention in C goes like this:

  • Ordinary variables and functions are lowercased or camel-cased, e.g. count, countOfInputBits.

  • User-defined types (and in some conventions global variables) are capitalized, e.g. Stack, TotalBytesAllocated.

  • Constants created with #define or enum are put in all-caps: MAXIMUM_STACK_SIZE, BUFFER_LIMIT.

4. Using variables

Ignoring pointers (C/Pointers) for the moment, there are essentially two things you can do to a variable: you can assign a value to it using the = operator, as in:

   1     x = 2;      /* assign 2 to x */
   2     y = 3;      /* assign 3 to y */

or you can use its value in an expression:

   1     x = y+1;    /* assign y+1 to x */

The assignment operator is an ordinary operator, and assignment expressions can be used in larger expressions:

    x = (y=2)*3; /* sets y to 2 and x to 6 */

This feature is usually only used in certain standard idioms, since it's confusing otherwise.

There are also shorthand operators for expressions of the form variable = variable operator expression. For example, writing x += y is equivalent to writing x = x + y, x /= y is the same as x = x / y, etc.

For the special case of adding or subtracting 1, you can abbreviate still further with the ++ and -- operators. These come in two versions, depending on whether you want the result of the expression (if used in a larger expression) to be the value of the variable before or after the variable is incremented:

   1     x = 0;
   2     y = x++;    /* sets x to 1 and y to 0 (the old value) */
   3     y = ++x;    /* sets x to 2 and y to 2 (the new value) */

The intuition is that if the ++ comes before the variable, the increment happens before the value of the variable is read (a preincrement; if it comes after, it happens after the value is read (a postincrement). This is confusing enough that it is best not to use the value of preincrement or postincrement operations except in certain standard idioms. But using x++ by itself as a substitute for x = x+1 is perfectly acceptable style.


CategoryProgrammingNotes

5. C/IntegerTypes

6. Integer types

In order to declare a variable, you have to specify a type, which controls both how much space the variable takes up and how the bits stored within it are interpreted in arithmetic operators.

The standard C integer types are:2

Name

Typical size

Signed by default?

char

8 bits

Unspecified

short

16 bits

Yes

int

32 bits

Yes

long

32 bits

Yes

long long

64 bits

Yes

The typical size is for 32-bit architectures like the Intel i386. Some 64-bit machines might have 64-bit ints and longs, and some prehistoric computers had 16-bit ints. Particularly bizarre architectures might have even wilder bit sizes, but you are not likely to see this unless you program vintage 1970s supercomputers. Some compilers also support a long long type that is usually twice the length of a long (e.g. 64 bits on i386 machines); this may or may not be available if you insist on following the ANSI specification strictly. The general convention is that int is the most convenient size for whatever computer you are using and should be used by default.

Whether a variable is signed or not controls how its values are interpreted. In signed integers, the first bit is the sign bit and the rest are the value in 2's complement notation; so for example a signed char with bit pattern 11111111 would be interpreted as the numerical value -1 while an unsigned char with the same bit pattern would be 255. Most integer types are signed unless otherwise specified; an n-bit integer type has a range from -2n-1 to 2n-1-1 (e.g. -32768 to 32767 for a short.) Unsigned variables, which can be declared by putting the keyword unsigned before the type, have a range from 0 to 2n-1 (e.g. 0 to 65535 for an unsigned short).

For chars, whether the character is signed (-128..127) or unsigned (0..255) is at the whim of the compiler. If it matters, declare your variables as signed char or unsigned char. For storing actual characters that you aren't doing arithmetic on, it shouldn't matter.

6.1. C99 fixed-width types

C99 provides a stdint.h header file that defines integer types with known size independent of the machine architecture. So in C99, you can use int8_t instead of signed char to guarantee a signed type that holds exactly 8 bits, or uint64_t instead of unsigned long long to get a 64-bit unsigned integer type. The full set of types typically defined are int8_t, int16_t, int32_t, and int64_t for signed integers and the same starting with uint for signed integers. There are also types for integers that contain the fewest number of bits greater than some minimum (e.g., int_least16_t is a signed type with at least 16 bits, chosen to minimize space) or that are the fastest type with at least the given number of bits (e.g., int_fast16_t is a signed type with at least 16 bits, chosen to minimize time).

These are all defined using typedef; the main advantage of using stdint.h over defining them yourself is that if somebody ports your code to a new architecture, stdint.h should take care of choosing the right types automatically. The disadvantage is that, like many C99 features, stdint.h is not universally available on all C compilers.

If you need to print types defined in stdint.h, the larger inttypes.h header defines macros that give the corresponding format strings for printf.

7. Integer constants

Constant integer values in C can be written in any of four different ways:

  • In the usual decimal notation, e.g. 0, 1, -127, 9919291, 97.

  • In octal or base 8, when the leading digit is 0, e.g. 01 for 1, 010 for 8, 0777 for 511, 0141 for 97.

  • In hexadecimal or base 16, when prefixed with 0x. The letters a through f are used for the digits 10 through 15. For example, 0x61 is another way to write 97.

  • Using a character constant, which is a single ASCII character or an escape sequence inside single quotes. The value is the ASCII value of the character: 'a' is 97.3 Unlike languages with separate character types, C characters are identical to integers; you can (but shouldn't) calculate 972 by writing 'a'*'a'. You can also store a character anywhere.

Except for character constants, you can insist that an integer constant is unsigned or long by putting a u or l after it. So 1ul is an unsigned long version of 1. By default integer constants are (signed) ints. For long long constants, use ll, e.g., the unsigned long long constant 0xdeadbeef01234567ull. It is also permitted to write the l as L, which can be less confusing if the l looks too much like a 1.

8. Integer operators

8.1. Arithmetic operators

The usual + (addition), - (negation or subtraction), and * (multiplication) operators work on integers pretty much the way you'd expect. The only caveat is that if the result lies outside of the range of whatever variable you are storing it in, it will be truncated instead of causing an error:

   1     unsigned char c;
   2 
   3     c = -1;             /* sets c = 255 */
   4     c = 255 + 255;      /* sets c = 254 */
   5     c = 256 * 1772717;  /* sets c = 0 */

This can be a source of subtle bugs if you aren't careful. The usual giveaway is that values you thought should be large positive integers come back as random-looking negative integers.

Division (/) of two integers also truncates: 2/3 is 0, 5/3 is 1, etc. For positive integers it will always round down.

Prior to C99, if either the numerator or denominator is negative, the behavior was unpredictable and depended on what your processor does---in practice this meant you should never use / if one or both arguments might be negative. The C99 standard specified that integer division always removes the fractional part, effectively rounding toward 0; so (-3)/2 is -1, 3/-2 is -1, and (-3)/-2 is 1.

There is also a remainder operator % with e.g. 2%3 = 2, 5%3 = 2, 27 % 2 = 1, etc. The sign of the modulus is ignored, so 2%-3 is also 2. The sign of the dividend carries over to the remainder: (-3)%2 and (-3)%(-2) are both 1. The reason for this rule is that it guarantees that y == x*(y/x) + y%x is always true.

8.2. Bitwise operators

In addition to the arithmetic operators, integer types support bitwise logical operators that apply some Boolean operation to all the bits of their arguments in parallel. What this means is that the i-th bit of the output is equal to some operation applied to the i-th bit(s) of the input(s). The bitwise logical operators are ~ (bitwise negation: used with one argument as in ~0 for the all-1's binary value), & (bitwise AND), '|' (bitwise OR), and '^' (bitwise XOR, i.e. sum mod 2). These are mostly used for manipulating individual bits or small groups of bits inside larger words, as in the expression x & 0x0f, which strips off the bottom four bits stored in x.

Examples:

x

y

expression

value

0011

0101

x&y

0001

0011

0101

x|y

0111

0011

0101

x^y

0101

0011

0101

~x

1100

The shift operators << and >> shift the bit sequence left or right: x << y produces the value x⋅2y (ignoring overflow); this is equivalent to shifting every bit in x y positions to the left and filling in y zeros for the missing positions. In the other direction, x >> y produces the value ⌊x⋅2-y, by shifting x y positions to the right. The behavior of the right shift operator depends on whether x is unsigned or signed; for unsigned values, it shifts in zeros from the left end always; for signed values, it shifts in additional copies of the leftmost bit (the sign bit). This makes x >> y have the same sign as x if x is signed.

If y is negative, it reverses the direction of the shift; so x << -2 is equivalent to x >> 2.

Examples (unsigned char x):

x

y

x << y

x >> y

00000001

1

00000010

00000000

11111111

3

11111000

00011111

10111001

-2

00101110

11100100

Examples (signed char x):

x

y

x << y

x >> y

00000001

1

00000010

00000000

11111111

3

11111000

11111111

10111001

-2

11101110

11100100

Shift operators are often used with bitwise logical operators to set or extract individual bits in an integer value. The trick is that (1 << i) contains a 1 in the i-th least significant bit and zeros everywhere else. So x & (1<<i) is nonzero if and only if x has a 1 in the i-th place. This can be used to print out an integer in binary format (which standard printf won't do):

   1 void
   2 print_binary(unsigned int n)
   3 {
   4     unsigned int mask = 0;
   5 
   6     /* this grotesque hack creates a bit pattern 1000... */
   7     /* regardless of the size of an unsigned int */
   8     mask = ~mask ^ (~mask >> 1);
   9 
  10     for(; mask != 0; mask >>= 1) {
  11         putchar((n & mask) ? '1' : '0');
  12     }
  13 }

(See test_print_binary.c for a program that uses this.)

In the other direction, we can set the i-th bit of x to 1 by doing x | (1 << i) or to 0 by doing x & ~(1 << i). See C/BitExtraction for applications of this to build arbitrarily-large bit vectors.

8.3. Logical operators

To add to the confusion, there are also three logical operators that work on the truth-values of integers, where 0 is defined to be false and anything else is defined by be true. These are && (logical AND), ||, (logical OR), and ! (logical NOT). The result of any of these operators is always 0 or 1 (so !!x, for example, is 0 if x is 0 and 1 if x is anything else). The && and || operators evaluate their arguments left-to-right and ignore the second argument if the first determines the answer (this is the only place in C where argument evaluation order is specified); so

   1     0 && execute_programmer();
   2     1 || execute_programmer();

is in a very weak sense perfectly safe code to run.

Watch out for confusing & with &&. The expression 1 & 2 evaluates to 0, but 1 && 2 evaluates to 1. The statement 0 & execute_programmer(); is also unlikely to do what you want.

Yet another logical operator is the ternary operator ?:, where x ? y : z equals the value of y if x is nonzero and z if x is zero. Like && and ||, it only evaluates the arguments it needs to:

   1     fileExists(badFile) ? deleteFile(badFile) : createFile(badFile);

Most uses of ?: are better done using an if-then-else statement (C/Statements).

8.4. Relational operators

Logical operators usually operate on the results of relational operators or comparisons: these are '==' (equality), '!=' (inequality), '<=' (less than or equal to) and '>=' (greater than or equal to). So, for example,

    if(size >= MIN_SIZE && size <= MAX_SIZE) {
        puts("just right");
    }

tests if size is in the (inclusive) range [MIN_SIZE..MAX_SIZE].

Beware of confusing == with =. The code

   1     /* DANGER! DANGER! DANGER! */
   2     if(x = 5) {
   3         ...

is perfectly legal C, and will set x to 5 rather than testing if it's equal to 5. Because 5 happens to be nonzero, the body of the if statement will always be executed. This error is so common and so dangerous that gcc will warn you about any tests that look like this if you use the -Wall option.

9. Input and output

To input or output integer values, you will need to convert them from or to strings. Converting from a string is easy using the atoi or atol functions declared in stdlib.h; these take a string as an argument and return an int or long, respectively.4

Output is usually done using printf (or sprintf if you want to write to a string without producing output). Use the %d format specifier for ints, shorts, and chars that you want the numeric value of, %ld for longs, and %lld for long longs.

A contrived program that uses all of these features is given below:

   1 #include <stdio.h>
   2 #include <stdlib.h>
   3 
   4 /* This program can be used to how atoi etc. handle overflow. */
   5 /* For example, try "overflow 1000000000000". */
   6 int
   7 main(int argc, char **argv)
   8 {
   9     char c;
  10     int i;
  11     long l;
  12     long long ll;
  13     
  14     if(argc != 2) {
  15         fprintf(stderr, "Usage: %s n\n", argv[0]);
  16         return 1;
  17     }
  18     
  19     c = atoi(argv[1]);
  20     i = atoi(argv[1]);
  21     l = atol(argv[1]);
  22     ll = atoll(argv[1]);
  23 
  24     printf("char: %d  int: %d  long: %ld  long long: %lld", c, i, l, ll);
  25 
  26     return 0;
  27 }
overflow.c

10. Alignment

Modern CPU architectures typically enforce alignment restrictions on multi-byte values, which mean that the address of an int or long typically has to be a multiple of 4. This is an effect of the memory being organized as groups of 32 bits that are written in parallel instead of 8 bits. Such restrictions are not obvious when working with integer-valued variables directly, but will come up when we talk about pointers in C/Pointers.


CategoryProgrammingNotes

11. C/Statements

The bodies of C functions (including the main function) are made up of statements. These can either be simple statements that do not contain other statements, or compound statements that have other statements inside them. Control structures are compound statements like if/then/else, while, for, and do..while that control how or whether their component statements are executed.

12. Simple statements

The simplest kind of statement in C is an expression (followed by a semicolon, the terminator for all simple statements). Its value is computed and discarded. Examples:

   1     x = 2;              /* an assignment statement */
   2     x = 2+3;            /* another assignment statement */
   3     2+3;                /* has no effect---will be discarded by smart compilers */
   4     puts("hi");         /* a statement containing a function call */
   5     root2 = sqrt(2);    /* an assignment statement with a function call */

Most statements in a typical C program are simple statements of this form.

Other examples of simple statements are the jump statements return, break, continue, and goto. A return statement specifies the return value for a function (if there is one), and when executed it causes the function to exit immediately. The break and continue statements jump immediately to the end of a loop (or switch; see below) or the next iteration of a loop; we'll talk about these more when we talk about loops. The goto statement jumps to another location in the same function, and exists for the rare occasions when it is needed. Using it in most circumstances is a sin.

13. Compound statements

Compound statements come in two varieties: conditionals and loops.

13.1. Conditionals

These are compound statements that test some condition and execute one or another block depending on the outcome of the condition. The simplest is the if statement:

   1     if(houseIsOnFire) {
   2         /* ouch! */
   3         scream();
   4         runAway();
   5     }

The body of the if statement is executed only if the expression in parentheses at the top evaluates to true (which in C means any value that is not 0).

The braces are not strictly required, and are used only to group one or more statements into a single statement. If there is only one statement in the body, the braces can be omitted:

   1     if(programmerIsLazy) omitBraces();

This style is recommended only for very simple bodies. Omitting the braces makes it harder to add more statements later without errors.

   1     if(underAttack)
   2         launchCounterAttack();   /* executed only when attacked */
   3         hideInBunker();          /* ### DO NOT INDENT LIKE THIS ### executed always */

In the example above, the lack of braces means that the hideInBunker() statement is not part of the if statement, despite the misleading indentation. This sort of thing is why I generally always put in braces in an if.

An if statement may have an else clause, whose body is executed if the test is false (i.e. equal to 0).

   1     if(happy) {
   2         smile();
   3     } else {
   4         frown();
   5     }

A common idiom is to have a chain of if and else if branches that test several conditions:

   1     if(temperature < 0) {
   2         puts("brrr");
   3     } else if(temperature < 100) {
   4         puts("hooray");
   5     } else {
   6         puts("ouch!");
   7     }

This can be inefficient if there are a lot of cases, since the tests are applied sequentially. For tests of the form <expression> == <small constant>, the switch statement may provide a faster alternative. Here's a typical switch statement:

   1     /* print plural of cow, maybe using the obsolete dual number construction */
   2     switch(numberOfCows) {
   3     case 1:
   4         puts("cow");
   5         break;
   6     case 2:
   7         puts("cowen");
   8         break;
   9     default:
  10         puts("cows");
  11         break;
  12     }

This prints the string "cow" if there is one cow, "cowen" if there are two cowen, and "cows" if there are any other number of cows. The switch statement evaluates its argument and jumps to the matching case label, or to the default label if none of the cases match. Cases must be constant integer values.

The break statements inside the block jump to the end of the block. Without them, executing the switch with numberOfCows equal to 1 would print all three lines. This can be useful in some circumstances where the same code should be used for more than one case:

   1     switch(c) {
   2     case 'a':
   3     case 'e':
   4     case 'i':
   5     case 'o':
   6     case 'u':
   7         type = VOWEL;
   8         break;
   9     default:
  10         type = CONSONANT;
  11         break;
  12     }

or when a case "falls through" to the next:

   1     switch(countdownStart) {
   2     case 3:
   3         puts("3");
   4     case 2:
   5         puts("2");
   6     case 1:
   7         puts("1")
   8     case 0:
   9         puts("KABLOOIE!");
  10         break;
  11     default:
  12         puts("I can't count that high!");
  13         break;
  14     }

Note that it is customary to include a break on the last case even though it has no effect; this avoids problems later if a new case is added. It is also customary to include a default case even if the other cases supposedly exhaust all the possible values, as a check against bad or unanticipated inputs.

   1     switch(oliveSize) {
   2     case JUMBO():
   3         eatOlives(SLOWLY);
   4         break;
   5     case COLLOSSAL:
   6         eatOlives(QUICKLY);
   7         break;
   8     case SUPER_COLLOSSAL:
   9         eatOlives(ABSURDLY);
  10         break;
  11     default:
  12         /* unknown size! */
  13         abort();
  14         break;
  15     }

Though switch statements are better than deeply nested if/else-if constructions, it is often even better to organize the different cases as data rather than code. We'll see examples of this when we talk about function pointers C/FunctionPointers.

Nothing in the C standards prevents the case labels from being buried inside other compound statements. One rather hideous application of this fact is Duff's device.

13.2. Loops

There are three kinds of loops in C.

13.2.1. The while loop

A while loop tests if a condition is true, and if so, executes its body. It then tests the condition is true again, and keeps executing the body as long as it is. Here's a program that deletes every occurence of the letter e from its input.

   1 #include <stdio.h>
   2 
   3 int
   4 main(int argc, char **argv)
   5 {
   6     int c;
   7 
   8     while((c = getchar()) != EOF) {
   9         switch(c) {
  10         case 'e':
  11         case 'E':
  12             break;
  13         default:
  14             putchar(c);
  15             break;
  16         }
  17     }
  18 
  19     return 0;
  20 }

Note that the expression inside the while argument both assigns the return value of getchar to c and tests to see if it is equal to EOF (which is returned when no more input characters are available). This is a very common idiom in C programs. Note also that even though c holds a single character, it is declared as an int. The reason is that EOF (a constant defined in stdio.h) is outside the normal character range, and if you assign it to a variable of type char it will be quietly truncated into something else. Because C doesn't provide any sort of exception mechanism for signalling unusual outcomes of function calls, designers of library functions often have to resort to extending the output of a function to include an extra value or two to signal failure; we'll see this a lot when the null pointer shows up in C/Pointers.

13.2.2. The do..while loop

The do..while statement is like the while statement except the test is done at the end of the loop instead of the beginning. This means that the body of the loop is always executed at least once.

Here's a loop that does a random walk until it gets back to 0 (if ever). If we changed the do..while loop to a while loop, it would never take the first step, because pos starts at 0.

   1 #include <stdio.h>
   2 #include <stdlib.h>
   3 #include <time.h>
   4 
   5 int
   6 main(int argc, char **argv)
   7 {
   8     int pos = 0;       /* position of random walk */
   9 
  10     srandom(time(0));  /* initialize random number generator */
  11 
  12     do {
  13         pos += random() & 0x1 ? +1 : -1;
  14         printf("%d\n", pos);
  15     } while(pos != 0);
  16 
  17     return 0;
  18 }
random_walk.c

The do..while loop is used much less often in practice than the while loop. Note that it is always possible to convert a do..while loop to a while loop by making an extra copy of the body in front of the loop.

13.2.3. The for loop

The for loop is a form of SyntacticSugar that is used when a loop iterates over a sequence of values stored in some variable (or variables). Its argument consists of three expressions: the first initializes the variable and is called once when the statement is first reached. The second is the test to see if the body of the loop should be executed; it has the same function as the test in a while loop. The third sets the variable to its next value. Some examples:

   1     /* count from 0 to 9 */
   2     for(i = 0; i < 10; i++) {
   3         printf("%d\n", i);
   4     }
   5     
   6     /* and back from 10 to 0 */
   7     for(i = 10; i >= 0; i--) {
   8         printf("%d\n", i);
   9     }
  10 
  11     /* this loop uses some functions to move around */
  12     for(c = firstCustomer(); c != END_OF_CUSTOMERS; c = customerAfter(c)) {
  13         helpCustomer(c);
  14     }
  15 
  16     /* this loop prints powers of 2 that are less than n*/
  17     for(i = 1; i < n; i *= 2) {
  18         printf("%d\n", i);
  19     }
  20 
  21     /* this loop does the same thing with two variables by using the comma operator */
  22     for(i = 0, power = 1; power < n; i++, power *= 2) {
  23         printf("2^%d = %d\n", i, power);
  24     }
  25 
  26     /* Here are some nested loops that print a times table */
  27     for(i = 0; i < n; i++) {
  28         for(j = 0; j < n; j++) {
  29             printf("%d*%d=%d ", i, j, i*j);
  30         }
  31         putchar('\n');
  32     }

A for loop can always be rewritten as a while loop.

   1     for(i = 0; i < 10; i++) {
   2         printf("%d\n", i);
   3     }
   4 
   5     /* is exactly the same as */
   6 
   7     i = 0;
   8     while(i < 10) {
   9         printf("%d\n", i);
  10         i++;
  11     }

13.2.4. Loops with break, continue, and goto

The break statement immediately exits the innermmost enclosing loop or switch statement.

   1     for(i = 0; i < n; i++) {
   2         openDoorNumber(i);
   3         if(boobyTrapped()) {
   4             break;
   5         }
   6     }

The continue statement skips to the next iteration. Here is a program with a loop that iterates through all the integers from -10 through 10, skipping 0:

   1 #include <stdio.h>
   2 
   3 /* print a table of inverses */
   4 #define MAXN (10)
   5 
   6 int
   7 main(int argc, char **argv)
   8 {
   9     int n;
  10 
  11     for(n = -MAXN; n <= MAXN; n++) {
  12         if(n == 0) continue;
  13         printf("1.0/%3d = %+f\n", n, 1.0/n);
  14     }
  15 
  16     return 0;
  17 }
inverses.c

Occasionally, one would like to break out of more than one nested loop. The way to do this is with a goto statement.

   1     for(i = 0; i < n; i++) {
   2         for(j = 0; j < n; j++) {
   3             doSomethingTimeConsumingWith(i, j);
   4             if(checkWatch() == OUT_OF_TIME) {
   5                 goto giveUp;
   6             }
   7         }
   8     }
   9 giveUp:
  10     puts("done");

The target for the goto is a label, which is just an identifier followed by a colon and a statement (the empty statement ; is ok).

The goto statement can be used to jump anywhere within the same function body, but breaking out of nested loops is widely considered to be its only genuinely acceptable use in normal code.

13.3. Choosing where to put a loop exit

Choosing where to put a loop exit is usually pretty obvious: you want it after any code that you want to execute at least once, and before any code that you want to execute only if the termination test fails.

If you know in advance what values you are going to be iterating over, you will most likely be using a for loop:

   1 for(i = 0; i < n; i++) {
   2     a[i] = 0;
   3 }

Most of the rest of the time, you will want a while loop:

   1 while(!done()) {
   2     doSomething();
   3 }

The do..while loop comes up mostly when you want to try something, then try again if it failed:

   1 do {
   2     result = fetchWebPage(url);
   3 } while(result == 0);

Finally, leaving a loop in the middle using break can be handy if you have something extra to do before trying again:

   1 for(;;) {
   2     result = fetchWebPage(url);
   3     if(result != 0) {
   4         break;
   5     }
   6     /* else */
   7     fprintf(stderr, "fetchWebPage failed with error code %03d\n", result);
   8     sleep(retryDelay);  /* wait before trying again */
   9 }

(Note the empty for loop header means to loop forever; while(1) also works.)


CategoryProgrammingNotes

14. C/AssemblyLanguage

C is generally compiled to assembly language first, and then an assembler compiles the assembly language to actual machine code. Typically the intermediate assembly language code is deleted.

If you are very curious about what the compiler does to your code, you can run gcc with the -S option to tell it to stop after creating the assembly file; for example, the command gcc -S program.c will create a file program.s containing the compiled code. What this code will look like depends both on what your C code looks like and what other options you give the C compiler. You can learn how to read (or write) assembly for the Intel i386 architecture by following this link (but see these notes for an explanation of the variant of i386 assembly language supported by gas, the GNU assembler used by gcc), or you can just jump in and guess what is going on by trying to expand the keywords (hint: movl stands for "move long," addl for "add long," etc., numbers preceded by dollar signs are constants, and things like %ebx are the names of registers, high-speed memory locations built in to the CPU).

For example, here's a short C program that implements a simple loop in two different ways, the first using the standard for loop construct and the second building a for loop by hand using goto (don't do this).

   1 #include <stdio.h>
   2 
   3 int
   4 main(int argc, char **argv)
   5 {
   6     int i;
   7 
   8     /* normal loop */
   9     for(i = 0; i < 10; i++) {
  10         printf("%d\n", i);
  11     }
  12 
  13     /* non-standard implementation using if and goto */
  14     i = 0;
  15   start:
  16     if(i < 10) {
  17         printf("%d\n", i);
  18         i++;
  19         goto start;
  20     }
  21     
  22     return 0;
  23 }
loops.c

and here is the output of the compiler using gcc -S loops.c:

        .file   "loops.c"
        .section        .rodata
.LC0:
        .string "%d\n"
        .text
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        andl    $-16, %esp
        movl    $0, %eax
        subl    %eax, %esp
        movl    $0, -4(%ebp)
.L2:
        cmpl    $9, -4(%ebp)
        jle     .L5
        jmp     .L3
.L5:
        movl    -4(%ebp), %eax
        movl    %eax, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        leal    -4(%ebp), %eax
        incl    (%eax)
        jmp     .L2
.L3:
        movl    $0, -4(%ebp)
.L6:
        cmpl    $9, -4(%ebp)
        jg      .L7
        movl    -4(%ebp), %eax
        movl    %eax, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        leal    -4(%ebp), %eax
        incl    (%eax)
        jmp     .L6
.L7:
        movl    $0, %eax
        leave
        ret
        .size   main, .-main
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.3.4 (Debian)"

Note that even though the two loops are doing the same thing, the structure is different. The first uses jle (jump if less than or equal to) to jump over the jmp that skips the body of the loop if it shouldn't be executed, while the second uses jg (jump if greater than) to skip it directly, which is closer to what the C code says to do. This is not unusual; for built-in control structures the compiler will often build odd-looking implementations that still work, for subtle and mysterious reasons of its own.

We can make them even more different by turning on the optimizer, e.g. with gcc -S -O3 -funroll-loops loops.c:

        .file   "loops.c"
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%d\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        pushl   %ebp
        xorl    %ecx, %ecx
        movl    %esp, %ebp
        pushl   %ebx
        subl    $20, %esp
        movl    $2, %ebx
        andl    $-16, %esp
        movl    %ecx, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        movl    $.LC0, (%esp)
        movl    $1, %edx
        movl    %edx, 4(%esp)
        call    printf
        movl    %ebx, 4(%esp)
        movl    $5, %ebx
        movl    $.LC0, (%esp)
        call    printf
        movl    $.LC0, (%esp)
        movl    $3, %ecx
        movl    %ecx, 4(%esp)
        call    printf
        movl    $.LC0, (%esp)
        movl    $4, %edx
        movl    %edx, 4(%esp)
        call    printf
        movl    %ebx, 4(%esp)
        xorl    %ebx, %ebx
        movl    $.LC0, (%esp)
        call    printf
        movl    $.LC0, (%esp)
        movl    $6, %ecx
        movl    %ecx, 4(%esp)
        call    printf
        movl    $.LC0, (%esp)
        movl    $7, %edx
        movl    %edx, 4(%esp)
        call    printf
        movl    $.LC0, (%esp)
        movl    $8, %eax
        movl    %eax, 4(%esp)
        call    printf
        movl    $.LC0, (%esp)
        movl    $9, %edx
        movl    %edx, 4(%esp)
        .p2align 4,,15
.L31:
        call    printf
.L7:
        cmpl    $9, %ebx
        jg      .L8
        movl    %ebx, 4(%esp)
        incl    %ebx
        movl    $.LC0, (%esp)
        jmp     .L31
.L8:
        movl    -4(%ebp), %ebx
        xorl    %eax, %eax
        leave
        ret
        .size   main, .-main
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.3.4 (Debian)"

Now the first loop is gone, replaced by ten separate calls to printf. Some of the arguments are loaded up in odd places; for example, 2 and 5 are stored in their registers well before they are used. Again, the compiler moves in mysterious ways, but guarantees that the resulting code will do what you asked. The second loop is left pretty much the same as it was, since the C compiler doesn't have enough information to recognize it as a loop.

The moral of the story: if you write your code using standard idioms, it's more likely that the C compiler will understand what you want and be able to speed it up for you. The odds are that using a non-standard idiom will cost you more in time by confusing the compiler than it will gain you in slight code improvements.


CategoryProgrammingNotes

15. C/Functions

A function, procedure, or subroutine encapsulates some complex computation as a single operation. Typically, when we call a function, we pass as arguments all the information this function needs, and any effect it has will be reflected in either its return value or (in some cases) in changes to values pointed to by the arguments. Inside the function, the arguments are copied into local variables, which can be used just like any other local variable---they can even be assigned to without affecting the original argument.

16. Function definitions

A typical function definition looks like this:

   1 /* Returns the square of the distance between two points separated by 
   2    dx in the x direction and dy in the y direction. */
   3 int
   4 distSquared(int dx, int dy)
   5 {
   6     return dx*dx + dy*dy;
   7 }

The part outside the braces is called the function declaration; the braces and their contents is the function body.

Like most complex declarations in C, once you delete the type names the declaration looks like how the function is used: the name of the function comes before the parentheses and the arguments inside. The ints scattered about specify the type of the return value of the function (Line 3) and of the parameters (Line 4); these are used by the compiler to determine how to pass values in and out of the function and (usually for more complex types, since numerical types will often convert automatically) to detect type mismatches.

If you want to define a function that doesn't return anything, declare its return type as void. You should also declare a parameter list of void if the function takes no arguments.

   1 /* Prints "hi" to stdout */
   2 void
   3 helloWorld(void)
   4 {
   5     puts("hi");
   6 }

It is not strictly speaking an error to omit the second void here. Putting void in for the parameters tells the compiler to enforce that no arguments are passed in. If we had instead declared helloWorld as

   1 /* Prints "hi" to stdout */
   2 void
   3 helloWorld()    /* DANGER! */
   4 {
   5     puts("hi");
   6 }

it would be possible to call it as

   1     helloWorld("this is a bogus argument");

without causing an error. The reason is that a function declaration with no arguments means that the function can take an unspecified number of arguments, and it's up to the user to make sure they pass in the right ones. There are good historical reasons for what may seem like obvious lack of sense in the design of the language here, and fixing this bug would break most C code written before 1989. But you shouldn't ever write a function declaration with an empty argument list, since you want the compiler to know when something goes wrong.

17. Calling a function

A function call consists of the function followed by its arguments (if any) inside parentheses, separated by comments. For a function with no arguments, call it with nothing between the parentheses. A function call that returns a value can be used in an expression just like a variable. A call to a void function can only be used as an expression by itself:

   1     totalDistance += distSquared(x1 - x2, y1 - y2);
   2     helloWorld();
   3     greetings += helloWorld();  /* ERROR */

18. The return statement

To return a value from a function, write a return statement, e.g.

   1     return 172;

The argument to return can be any expression. Unlike the expression in, say, an if statement, you do not need to wrap it in parentheses. If a function is declared void, you can do a return with no expression, or just let control reach the end of the function.

Executing a return statement immediately terminates the function. This can be used like break to get out of loops early.

   1 /* returns 1 if n is prime, 0 otherwise */
   2 int
   3 isPrime(int n)
   4 {
   5     int i;
   6 
   7     if (n < 2) return 0;   /* special case for 0, 1, negative n */
   8  
   9     for(i = 2; i < n; i++) {
  10         if (n % i == 0) {
  11             /* found a factor */
  12             return 0;
  13         }
  14     }
  15 
  16     /* no factors */
  17     return 1;
  18 }

19. Function declarations and modules

By default, functions have global scope: they can be used anywhere in your program, even in other files. If a file doesn't contain a declaration for a function someFunc before it is used, the compiler will assume that it is declared like int someFunc() (i.e., return type int and unknown arguments). This can produce infuriating complaints later when the compiler hits the real declaration and insists that your function someFunc should be returning an int and you are a bonehead for declaring it otherwise.

The solution to such insulting compiler behavior errors is to either (a) move the function declaration before any functions that use it; or (b) put in a declaration without a body before any functions that use it, in addition to the declaration that appears in the function definition. (Note that this violates the no separate but equal rule, but the compiler should tell you when you make a mistake.) Option (b) is generally preferred, and is the only option when the function is used in a different file.

To make sure that all declarations of a function are consistent, the usual practice is to put them in an include file. For example, if distSquared is used in a lot of places, we might put it in its own file distSquared.c:

   1 #include "distSquared.h"
   2 
   3 int
   4 distSquared(int dx, int dy)
   5 {
   6     return dx*dx + dy*dy;
   7 }

This file uses #include to include a copy of this file, distSquared.h:

   1 /* Returns the square of the distance between two points separated by 
   2    dx in the x direction and dy in the y direction. */
   3 int distSquared(int dx, int dy);

Note that the declaration in distSquared.h doesn't have a body; instead, it's terminated by a semicolon like a variable declaration. It's also worth noting that we moved the documenting comment to distSquared.h: the idea is that distSquared.h is the public face of this (very small one-function) module, and so the explanation of how to use the function should be there.

The reason distSquared.c includes distSquared.h is to get the compiler to verify that the declarations in the two files match. But to use the distSquared function, we also put #include "distSquared.h" at the top of the file that uses it:

   1 #include "distSquared.h"
   2 
   3 #define THRESHOLD (100)
   4 
   5 int
   6 tooClose(int x1, int y1, int x2, int y2)
   7 {
   8     return distSquared(x1 - x2, y1 - y2) < THRESHOLD;
   9 }

The #include on line 1 uses double quotes instead of angle brackets; this tells the compiler to look for distSquared.h in the current directory instead of the system include directory (typically /usr/include).

20. Static functions

By default, all functions are global; they can be used in any file of your program whether or not a declaration appears in a header file. To restrict access to the current file, declare a function static, like this:

   1 static void
   2 helloHelper(void)
   3 {
   4     puts("hi!");
   5 }
   6 
   7 void
   8 hello(int repetitions)
   9 {
  10     int i;
  11 
  12     for(i = 0; i < repetitions; i++) {
  13         helloHelper();
  14     }
  15 }

The function hello will be visible everywhere. The function helloHelper will only be visible in the current file.

It's generally good practice to declare a function static unless you intend to make it available, since not doing so can cause namespace conflicts, where the presence of two functions with the same name either prevent the program from linking or---even worse---cause the wrong function to be called. The latter can happen with library functions, since C allows the programmer to override library functions by defining a new function with the same name. I once had a program fail in a spectacularly incomprehensible way because I'd written a select function without realizing that select is a core library function in C.

21. Local variables

A function may contain definitions of local variables, which are visible only inside the function and which survive only until the function returns. These may be declared at the start of any block (group of statements enclosed by braces), but it is conventional to declare all of them at the outermost block of the function.

   1 /* Given n, compute n! = 1*2*...*n */
   2 /* Warning: will overflow on 32-bit machines if n > 12 */
   3 int
   4 factorial(int n)
   5 {
   6     int i;
   7     int product;
   8 
   9     if(n < 2) return n;
  10     /* else */
  11 
  12     product = 1;
  13 
  14     for(i = 2; i <= n; i++) {
  15         product *= i;
  16     }
  17 
  18     return product;
  19 }

22. Mechanics of function calls

Several things happen under the hood when a function is called. Since a function can be called from several different places, the CPU needs to store its previous state to know where to go back. It also needs to allocate space for function arguments and local variables.

Some of this information will be stored in registers, memory locations built into the CPU itself, but most will go on the stack, a region of memory that on typical machines grows downward, even though the most recent additions to the stack are called the "top" of the stack. The location of the top of the stack is stored in the CPU in a special register called the stack pointer.

So a typical function call looks like this internally:

  1. The current instruction pointer or program counter value, which gives the address of the next line of machine code to be executed, is pushed onto the stack.

  2. Any arguments to the function are copied either into specially designated registers or onto new locations on the stack. The exact rules for how to do this vary from one CPU architecture to the next, but a typical convention might be that the first four arguments or so are copied into registers and the rest (if any) go on the stack.
  3. The instruction pointer is set to the first instruction in the code for the function.
  4. The function allocates additional space on the stack to hold its local variables (if any) and to save copies of the values of any registers it wants to use (so that it can restore their contents before returning to its caller).
  5. The function body is executed until it hits a return statement.

  6. Returning from the function is the reverse of invoking it: any saved registers are restored from the stack, the return value is copied to a standard register, and the values of the instruction pointer and stack pointer are restored to what they were before the function call.

From the programmer's perspective, the important point is that both the arguments and the local variables inside a function are stored in freshly-allocated locations that are thrown away after the function exits. So after a function call the state of the CPU is restored to its previous state, except for the return value. Any arguments that are passed to a function are passed as copies, so changing the values of the function arguments inside the function has no effect on the caller. Any information stored in local variables is lost.

Under rare circumstances, it may be useful to have a variable local to a function that persists from one function call to the next. You can do so by declaring the variable static. For example, here is a function that counts how many times it has been called:

   1 /* return the number of times the function has been called */
   2 int
   3 counter(void)
   4 {
   5     static count = 0;
   6 
   7     return ++count;
   8 }

Static local variables are stored outside the stack with global variables, and have unbounded extent. But they are only visible inside the function that declares them. This makes them slightly less dangerous than global variables---there is no fear that some foolish bit of code elsewhere will quietly change their value---but it is still the case that they usually aren't what you want. It is also likely that operations on static variables will be slightly slower than operations on ordinary ("automatic") variables, since making them persistent means that they have to be stored in (slow) main memory instead of (fast) registers.


CategoryProgrammingNotes

23. C/InputOutput

Input and output from C programs is typically done through the standard I/O library, whose functions etc. are declared in stdio.h. A detailed descriptions of the functions in this library is given in Appendix B of KernighanRitchie. We'll talk about some of the more useful functions and about how input-output (I/O) works on Unix-like operating systems in general.

24. Character streams

The standard I/O library works on character streams, objects that act like long sequences of incoming or outgoing characters. What a stream is connected to is often not apparent to a program that uses it; an output stream might go to a terminal, to a file, or even to another program (appearing there as an input stream).

Three standard streams are available to all programs: these are stdin (standard input), stdout (standard output), and stderr (standard error). Standard I/O functions that do not take a stream as an argument will generally either read from stdin or write to stdout. The stderr stream is used for error messages. It is kept separate from stdout so that you can see these messages even if you redirect output to a file:

$ ls no-such-file > /tmp/dummy-output
ls: no-such-file: No such file or directory

25. Reading and writing single characters

To read a single character from stdin, use getchar:

   1     int c;
   2 
   3     c = getchar();

The getchar routine will return the special value EOF (usually -1; short for end of file) if there are no more characters to read, which can happen when you hit the end of a file or when the user types the end-of-file key control-D to the terminal. Note that the return value of getchar is declared to be an int since EOF lies outside the normal character range.

To write a single character to stdout, use putchar:

   1     putchar('!');

Even though putchar can only write single bytes, it takes an int as an argument. Any value outside the range 0..255 will be truncated to its last byte, as in the usual conversion from int to unsigned char.

Both getchar and putchar are wrappers for more general routines getc and putc that allow you to specify which stream you are using. To illustrate getc and putc, here's how we might define getchar and putchar if they didn't exist already:

   1 int
   2 getchar2(void)
   3 {
   4     return getc(stdin);
   5 }
   6 
   7 int
   8 putchar2(int c)
   9 {
  10     return putc(c, stdout);
  11 }

Note that putc, putchar2 as defined above, and the original putchar all return an int rather than void; this is so that they can signal whether the write succeeded. If the write succeeded, putchar or putc will return the value written. If the write failed (say because the disk was full), then putc or putchar will return EOF.

Here's another example of using putc to make a new function putcerr that writes a character to stderr:

   1 int
   2 putcerr(int c)
   3 {
   4     return putc(c, stderr);
   5 }

A rather odd feature of the C standard I/O library is that if you don't like the character you just got, you can put it back using the ungetc function. The limitations on ungetc are that (a) you can only push one character back, and (b) that character can't be EOF. The ungetc function is provided because it makes certain high-level input tasks easier; for example, if you want to parse a number written as a sequence of digits, you need to be able to read characters until you hit the first non-digit. But if the non-digit is going to be used elsewhere in your program, you don't want to eat it. The solution is to put it back using ungetc.

Here's a function that uses ungetc to peek at the next character on stdin without consuming it:

   1 /* return the next character from stdin without consuming it */
   2 int
   3 peekchar(void)
   4 {
   5     int c;
   6 
   7     c = getchar();
   8     if(c != EOF) ungetc(c, stdin);      /* puts it back */
   9     
  10     return c;
  11 }

26. Formatted I/O

Reading and writing data one character at a time can be painful. The C standard I/O library provides several convenient routines for reading and writing formatted data. The most commonly used one is printf, which takes as arguments a format string followed by zero or more values that are filled in to the format string according to patterns appearing in it.

Here are some typical printf statements:

   1     printf("Hello\n");          /* print "Hello" followed by a newline */
   2     printf("%c", c);            /* equivalent to putchar(c) */
   3     printf("%d", n);            /* print n (an int) formatted in decimal */
   4     printf("%u", n);            /* print n (an unsigned int) formatted in decimal */
   5     printf("%o", n);            /* print n (an unsigned int) formatted in octal */
   6     printf("%x", n);            /* print n (an unsigned int) formatted in hexadecimal */
   7     printf("%f", x);            /* print x (a float or double) */
   8 
   9     /* print total (an int) and average (a double) on two lines with labels */
  10     printf("Total: %d\nAverage: %f\n", total, average);

For a full list of formatting codes see Table B-1 in KernighanRitchie, or run man 3 printf.

The inverse of printf is scanf. The scanf function reads formatted data from stdin according to the format string passed as its first argument and stuffs the results into variables whose addresses are given by the later arguments. This requires prefixing each such argument with the & operator, which takes the address of a variable.

Format strings for scanf are close enough to format strings for printf that you can usually copy them over directly. However, because scanf arguments don't go through argument promotion (where all small integer types are converted to int and floats are converted to double), you have to be much more careful about specifying the type of the argument correctly.

   1     scanf("%c", &c);            /* like c = getchar(); c must be a char */
   2     scanf("%d", &n);            /* read an int formatted in decimal */
   3     scanf("%u", &n);            /* read an unsigned int formatted in decimal */
   4     scanf("%o", &n);            /* read an unsigned int formatted in octal */
   5     scanf("%x", &n);            /* read an unsigned int formatted in hexadecimal */
   6     scanf("%f", &x);            /* read a float */
   7     scanf("%lf", &x);           /* read a double */
   8 
   9     /* read total (an int) and average (a float) on two lines with labels */
  10     /* (will also work if input is missing newlines or uses other whitespace, see below) */
  11     scanf("Total: %d\nAverage: %f\n", &total, &average);

The scanf routine eats whitespace (spaces, tabs, newlines, etc.) in its input whenever it sees a conversion specification or a whitespace character in its format string. Non-whitespace characters that are not part of conversion specifications must match exactly. To detect if scanf parsed everything successfully, look at its return value; it returns the number of values it filled in, or EOF if it hits end-of-file before filling in any values.

The printf and scanf routines are wrappers for fprintf and fscanf, which take a stream as their first argument, e.g.:

   1     fprintf(stderr, "BUILDING ON FIRE, %d%% BURNT!!!\n", percentage);

Note the use of "%%" to print a single percent in the output.

27. Rolling your own I/O routines

Since we can write our own functions in C, if we don't like what the standard routines do, we can build our own on top of them. For example, here's a function that reads in integer values without leading minus signs and returns the result. It uses the peekchar routine we defined above, as well as the isdigit routine declared in ctype.h.

   1 /* read an integer written in decimal notation from stdin until the first
   2  * non-digit and return it.  Returns 0 if there are no digits. */
   3 int
   4 readNumber(void)
   5 {
   6     int accumulator;    /* the number so far */
   7     int c;              /* next character */
   8 
   9     accumulator = 0;
  10 
  11     while((c = peekchar()) != EOF && isdigit(c)) {
  12         c = getchar();                  /* consume it */
  13         accumulator *= 10;              /* shift previous digits over */
  14         accumulator += (c - '0');       /* add decimal value of new digit */
  15     }
  16 
  17     return accumulator;
  18 }

Here's another implementation that does almost the same thing:

   1 int
   2 readNumber2(void)
   3 {
   4     int n;
   5 
   6     if(scanf("%u", &n) == 1) {
   7         return n;
   8     } else {
   9         return 0;
  10     }
  11 }

The difference is that readNumber2 will consume any whitespace before the first digit, which may or may not be what we want.

More complex routines can be used to parse more complex input. For example, here's a routine that uses readNumber to parse simple arithmetic expressions, where each expression is either a number or of the form (expression+expression) or (expression*expression). The return value is the value of the expression after adding together or multiplying all of its subexpressions. (A complete program including this routine and the others defined earlier that it uses can be found in calc.c.)

   1 #define EXPRESSION_ERROR (-1)
   2 
   3 /* read an expression from stdin and return its value */
   4 /* returns EXPRESSION_ERROR on error */
   5 int
   6 readExpression(void)
   7 {
   8     int e1;             /* value of first sub-expression */
   9     int e2;             /* value of second sub-expression */
  10     int c;
  11     int op;             /* operation: '+' or '*' */
  12 
  13     c = peekchar();
  14 
  15     if(c == '(') {
  16         c = getchar();
  17 
  18         e1 = readExpression();
  19         op = getchar();
  20         e2 = readExpression();
  21 
  22         c = getchar();  /* this had better be ')' */
  23         if(c != ')') return EXPRESSION_ERROR;
  24 
  25         /* else */
  26         switch(op) {
  27         case '*':
  28             return e1*e2;
  29             break;
  30         case '+':
  31             return e1+e2;
  32             break;
  33         default:
  34             return EXPRESSION_ERROR;
  35             break;
  36         }
  37     } else if(isdigit(c)) {
  38         return readNumber();
  39     } else {
  40         return EXPRESSION_ERROR;
  41     }
  42 }

Because this routine calls itself recursively as it works its way down through the input, it is an example of a recursive descent parser. Parsers for more complicated languages (e.g. C) are usually not written by hand like this, but are instead constructed mechanically using a Parser generator.

28. File I/O

Reading and writing files is done by creating new streams attached to the files. The function that does this is fopen. It takes two arguments: a filename, and a flag that controls whether the file is opened for reading or writing. The return value of fopen has type FILE * and can be used in putc, getc, fprintf, etc. just like stdin, stdout, or stderr. When you are done using a stream, you should close it using fclose.

Here's a program that reads a list of numbers from a file whose name is given as argv[1] and prints their sum:

   1 #include <stdio.h>
   2 #include <stdlib.h>
   3 
   4 int
   5 main(int argc, char **argv)
   6 {
   7     FILE *f;
   8     int x;
   9     int sum;
  10 
  11     if(argc < 2) {
  12         fprintf(stderr, "Usage: %s filename\n", argv[0]);
  13         exit(1);
  14     }
  15 
  16     f = fopen(argv[1], "r");
  17     if(f == 0) {
  18         /* perror is a standard C library routine */
  19         /* that prints a message about the last failed library routine */
  20         /* prepended by its argument */
  21         perror(filename);
  22         exit(2);
  23     }
  24 
  25     /* else everything is ok */
  26     sum = 0;
  27     while(fscanf("%d", &x) == 1) {
  28         sum += x;
  29     }
  30 
  31     printf("%d\n", sum);
  32 
  33     /* not strictly necessary but it's polite */
  34     fclose(f);
  35 
  36     return 0;
  37 }

To write to a file, open it with fopen(filename, "w"). Note that as soon as you call fopen with the "w" flag, any previous contents of the file are erased. If you want to append to the end of an existing file, use "a" instead. You can also add + onto the flag if you want to read and write the same file (this will probably involve using fseek).

Some operating systems (Windows) make a distinction between text and binary files. For text files, use the same arguments as above. For binary files, add a b, e.g. fopen(filename, "wb") to write a binary file.

   1 /* leave a greeting in the current directory */
   2 
   3 #include <stdio.h>
   4 #include <stdlib.h>
   5 
   6 #define FILENAME "hello.txt"
   7 #define MESSAGE "hello world"
   8 
   9 int
  10 main(int argc, char **argv)
  11 {
  12     FILE *f;
  13 
  14     f = fopen(FILENAME, "w");
  15     if(f == 0) {
  16         perror(FILENAME);
  17         exit(1);
  18     }
  19 
  20     /* unlike puts, fputs doesn't add a newline */
  21     fputs(MESSAGE, f);
  22     putc('\n', f);
  23 
  24     fclose(f);
  25 
  26     return 0;
  27 }
helloFile.c


CategoryProgrammingNotes

29. C/Pointers

30. Memory and addresses

Memory in a typical modern computer is divided into two classes: a small number of registers, which live on the CPU chip and perform specialized functions like keeping track of the location of the next machine code instruction to execute or the current stack frame, and main memory, which (mostly) lives outside the CPU chip and which stores the code and data of a running program. When the CPU wants to fetch a value from a particular location in main memory, it must supply an address: a 32-bit or 64-bit unsigned integer on typical current architectures, referring to one of up to 232 or 264 distinct 8-bit locations in the memory. These integers can be manipulated like any other integer; in C, they appear as pointers, a family of types that can be passed as arguments, stored in variables, returned from functions, etc.

31. Pointer variables

31.1. Declaring a pointer variable

The convention is C is that the declaration of a complex type looks like its use. To declare a pointer-valued variable, write a declaration for the thing that it points to, but include a * before the variable name:

   1     int *pointerToInt;
   2     double *pointerToDouble;
   3     char *pointerToChar;
   4     char **pointerToPointerToChar;

31.2. Assigning to pointer variables

Declaring a pointer-valued variable allocates space to hold the pointer but not to hold anything it points to. Like any other variable in C, a pointer-valued variable will initially contain garbage---in this case, the address of a location that might or might not contain something important. To initialize a pointer variable, you have to assign to it the address of something that already exists. Typically this is done using the & (address-of) operator:

   1     int n;              /* an int variable */
   2     int *p;             /* a pointer to an int */
   3 
   4     p = &n;             /* p now points to n */

31.3. Using a pointer

Pointer variables can be used in two ways: to get their value (a pointer), e.g. if you want to assign an address to more than one pointer variable:

   1     int n;              /* an int variable */
   2     int *p;             /* a pointer to an int */
   3     int *q;             /* another pointer to an int */
   4 
   5     p = &n;             /* p now points to n */
   6     q = p;              /* q now points to n as well */

But more often you will want to work on the value stored at the location pointed to. You can do this by using the * (dereference) operator, which acts as an inverse of the address-of operator:

   1     int n;              /* an int variable */
   2     int *p;             /* a pointer to an int */
   3 
   4     p = &n;             /* p now points to n */
   5 
   6     *p = 2;             /* sets n to 2 */
   7     *p = *p + *p;       /* sets n to 4 */

The * operator binds very tightly, so you can usually use *p anywhere you could use the variable it points to without worrying about parentheses. However, a few operators, such as --, ++, and . (used in C/Structs) bind tighter, requiring parantheses if you want the * to take precedence.

   1     (*p)++;             /* increment the value pointed to by p */
   2     *p++;               /* WARNING: increments p itself */

31.4. Printing pointers

You can print a pointer value using printf with the %p format specifier. To do so, you should convert the pointer to type void * first using a cast (see below for void * pointers), although on machines that don't have different representations for different pointer types, this may not be necessary.

Here is a short program that prints out some pointer values:

   1 #include <stdio.h>
   2 #include <stdlib.h>
   3 
   4 int G = 0;   /* a global variable, stored in BSS segment */
   5 
   6 int
   7 main(int argc, char **argv)
   8 {
   9     static int s;  /* static local variable, stored in BSS segment */
  10     int a;         /* automatic variable, stored on stack */
  11     int *p;        /* pointer variable for malloc below */
  12 
  13     /* obtain a block big enough for one int from the heap */
  14     p = malloc(sizeof(int));
  15 
  16     printf("&G   = %p\n", (void *) &G);
  17     printf("&s   = %p\n", (void *) &s);
  18     printf("&a   = %p\n", (void *) &a);
  19     printf("&p   = %p\n", (void *) &p);
  20     printf("p    = %p\n", (void *) p);
  21     printf("main = %p\n", (void *) main);
  22 
  23     free(p);
  24 
  25     return 0;
  26 }
looking_at_pointers.c

When I run this on a Mac OS X 10.6 machine after compiling with gcc, the output is:

&G   = 0x100001078
&s   = 0x10000107c
&a   = 0x7fff5fbff2bc
&p   = 0x7fff5fbff2b0
p    = 0x100100080
main = 0x100000e18

The interesting thing here is that we can see how the compiler chooses to allocate space for variables based on their storage classes. The global variable G and the static local variable s both persist between function calls, so they get placed in the BSS segment (see .bss) that starts somewhere around 0x100000000, typically after the code segment containing the actual code of the program. Local variables a and p are allocated on the stack, which grows down from somewhere near the top of the address space. The block that malloc returns that p points to is allocated off the heap, a region of memory that may also grow over time and starts after the BSS segment. Finally, main appears at 0x100000e18; this is in the code segment, which is a bit lower in memory than all the global variables.

32. The null pointer

The special value 0, known as the null pointer may be assigned to a pointer of any type. It may or may not be represented by the actual address 0, but it will act like 0 in all contexts (e.g., it has the value false in an if or while statement). Null pointers are often used to indicate missing data or failed functions. Attempting to dereference a null pointer can have catastrophic effects, so it's important to be aware of when you might be supplied with one.

33. Pointers and functions

A simple application of pointers is to get around C's limit on having only one return value from a function. Because C arguments are copied, assigning a value to an argument inside a function has no effect on the outside. So the doubler function below doesn't do much:

   1 #include <stdio.h>
   2 
   3 /* doesn't work */
   4 void
   5 doubler(int x)
   6 {
   7     x *= 2;
   8 }
   9 
  10 int
  11 main(int argc, char **argv)
  12 {
  13     int y;
  14 
  15     y = 1;
  16 
  17     doubler(y);                 /* no effect on y */
  18 
  19     printf("%d\n", y);          /* prints 1 */
  20 
  21     return 0;
  22 }
bad_doubler.c

However, if instead of passing the value of y into doubler we pass a pointer to y, then the doubler function can reach out of its own stack frame to manipulate y itself:

   1 #include <stdio.h>
   2 
   3 /* doesn't work */
   4 void
   5 doubler(int *x)
   6 {
   7     *x *= 2;
   8 }
   9 
  10 int
  11 main(int argc, char **argv)
  12 {
  13     int y;
  14 
  15     y = 1;
  16 
  17     doubler(&y);                /* sets y to 2 */
  18 
  19     printf("%d\n", y);          /* prints 2 */
  20 
  21     return 0;
  22 }
good_doubler.c

Generally, if you pass the value of a variable into a function (with no &), you can be assured that the function can't modify your original variable. When you pass a pointer, you should assume that the function can and will change the variable's value. If you want to write a function that takes a pointer argument but promises not to modify the target of the pointer, use const, like this:

   1 void
   2 printPointerTarget(const int *p)
   3 {
   4     printf("%d\n", *p);
   5 }

The const qualifier tells the compiler that the target of the pointer shouldn't be modified. This will cause it to return an error if you try to assign to it anyway:

   1 void
   2 printPointerTarget(const int *p)
   3 {
   4     *p = 5;  /* produces compile-time error */
   5     printf("%d\n", *p);
   6 }

Passing const pointers is mostly used when passing large structures to functions, where copying a 32-bit pointer is cheaper than copying the thing it points to.

If you really want to modify the target anyway, C lets you "cast away const":

   1 void
   2 printPointerTarget(const int *p)
   3 {
   4     *((int *) p) = 5;  /* produces compile-time error */
   5     printf("%d\n", *p);
   6 }

There is usually no good reason to do this; the one exception might be if the target of the pointer represents an AbstractDataType, and you want to modify its representation during some operation to optimize things somehow in a way that will not be visible outside the abstraction barrier, making it appear to leave the target constant.

Note that while it is safe to pass pointers down into functions, it is very dangerous to pass pointers up. The reason is that the space used to hold any local variable of the function will be reclaimed when the function exits, but the pointer will still point to the same location, even though something else may now be stored there. So this function is very dangerous:

   1 int *
   2 dangerous(void)
   3 {
   4     int n;
   5 
   6     return &n;          /* NO! */
   7 }
   8 
   9 ...
  10 
  11     *dangerous() = 12;  /* writes 12 to some unknown location */

An exception is when you can guarantee that the location pointed to will survive even after the function exits, e.g. when the location is dynamically allocated using malloc (see below) or when the local variable is declared static:

   1 int *
   2 returnStatic(void)
   3 {
   4     static int n;
   5 
   6     return &n;
   7 }
   8 
   9 ...
  10 
  11     *returnStatic() = 12;       /* writes 12 to the hidden static variable */

Usually returning a pointer to a static local variable is not good practice, since the point of making a variable local is to keep outsiders from getting at it. If you find yourself tempted to do this, a better approach is to allocate a new block using malloc (see below) and return a pointer to that. The downside of the malloc method is that the caller has to promise to call free on the block later, or you will get a storage leak.

34. Pointer arithmetic and arrays

Because pointers are just numerical values, one can do arithmetic on them. Specifically, it is permitted to

  • Add an integer to a pointer or subtract an integer from a pointer. The effect of p+n where p is a pointer and n is an integer is to compute the address equal to p plus n times the size of whatever p points to (this is why int * pointers and char * pointers aren't the same).

  • Subtract one pointer from another. The two pointers must have the same type (e.g. both int * or both char *). The result is an integer value, equal to the numerical difference between the addresses divided by the size of the objects pointed to.

  • Compare two pointers using ==, !=, <, >, <=, or >=.

  • Increment or decrement a pointer using ++ or --.

The main application of pointer arithmetic in C is in arrays. An array is a block of memory that holds one or more objects of a given type. It is declared by giving the type of object the array holds followed by the array name and the size in square brackets:

   1     int a[50];          /* array of 50 ints */
   2     char *cp[100];      /* array of 100 pointers to char */

Declaring an array allocates enough space to hold the specified number of objects (e.g. 200 bytes for a above and 400 for cp---note that a char * is an address, so it is much bigger than a char). The number inside the square brackets must be a constant whose value can be determined at compile time.

The array name acts like a constant pointer to the zeroth element of the array. It is thus possible to set or read the zeroth element using *a. But because the array name is constant, you can't assign to it:

   1     *a = 12;            /* sets zeroth element to 12 */
   2 
   3     a = &n;             /* #### DOESN'T WORK #### */

More common is to use square brackets to refer to a particular element of the array. The expression a[n] is defined to be equivalent to *(a+n); the index n (an integer) is added to the base of the array (a pointer), to get to the location of the n-th element of a. The implicit * then dereferences this location so that you can read its value (in a normal expression) or assign to it (on the left-hand side of an assignment operator). The effect is to allow you to use a[n] just as you would any other variable of type int (or whatever type a was declared as).

Note that C doesn't do any sort of bounds checking. Given the declaration int a[50];, only indices from a[0] to a[49] can be used safely. However, the compiler will not blink at a[-12] or a[10000]. If you read from such a location you will get garbage data; if you write to it, you will overwrite god-knows-what, possibly trashing some other variable somewhere else in your program or some critical part of the stack (like the location to jump to when you return from a function). It is up to you as a programmer to avoid such buffer overruns, which can lead to very mysterious (and in the case of code that gets input from a network, security-damaging) bugs. The valgrind program can help detect such overruns in some cases (see C/valgrind).

Another curious feature of the definition of a[n] as identical to *(a+n) is that it doesn't actually matter which of the array name or the index goes inside the braces. So all of a[0], *a, and 0[a] refer to the zeroth entry in a. Unless you are deliberately trying to obfuscate your code, it's best to write what you mean.

34.1. Arrays and functions

Because array names act like pointers, they can be passed into functions that expect pointers as their arguments. For example, here is a function that computes the sum of all the values in an array a of size n:

   1 /* return the sum of the values in a, an array of size n */
   2 int
   3 sumArray(int in, const int *a)
   4 {
   5     int i;
   6     int sum;
   7 
   8     sum = 0;
   9     for(i = 0; i < n; i++) {
  10         sum += a[i];
  11     }
  12 
  13     return sum;
  14 }

Note the use of const to promise that sumArray won't modify the contents of a.

Another way to write the function header is to declare a as an array of unknown size:

   1 /* return the sum of the values in a, an array of size n */
   2 int
   3 sumArray(int n, const int a[])
   4 {
   5     ...
   6 }

This has exactly the same meaning to the compiler as the previous definition. Even though normally the declarations int a[10] and int *a mean very different things (the first one allocates space to hold 10 ints, and prevents assigning a new value to a), in a function argument int a[] is just SyntacticSugar for int *a. You can even modify what a points to inside sumArray by assigning to it. This will allow you to do things that you usually don't want to do, like write this hideous routine:

   1 /* return the sum of the values in a, an array of size n */
   2 int
   3 sumArray(int n, const int a[])
   4 {
   5     const int *an;      /* pointer to first element not in a */
   6     int sum;
   7 
   8     sum = 0;
   9     an = a+n;
  10 
  11     while(a < an) {
  12         sum += *a++;
  13     }
  14 
  15     return sum;
  16 }

34.2. Multidimensional arrays

Arrays can themselves be members of arrays. The result is a multidimensional array, where a value in row i and column j is accessed by a[i][j].

Declaration is similar to one-dimensional arrays:

   1 int a[6][4];    /* declares an array of 6 rows of 4 ints each */

This declaration produces an array of 24 int values, packed contiguously in memory. The interpretation is that a is an array of 6 objects, each of which is an array of 4 ints.

If we imagine the array to contain increasing values like this:

 0  1  2  3  4  5
 6  7  8  9 10 11
12 13 14 15 16 17

the actual positions in memory will look like this:

 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
 ^                 ^                 ^
a[0]              a[1]              a[2]

To look up a value, we do the usual array-indexing magic. Suppose we want to find a[1][4]. The name a acts as a pointer to the base of the array.The name a[1] says to skip ahead 1 times the size of the things pointed to by a, which are arrays of 6 ints each, for a total size of 24 bytes assuming 4-byte ints. For a[1][4], we start at a[1] and move forward 4 times the size of the thing pointed to by a[1], which is an int; this puts us 24+16 bytes from a, the position of 10 in the picture above.

Like other array declarations, the size must be specified at compile time in pre-C99 C. If this is not desirable, a similar effect can be obtained by allocating each row separately using malloc and building a master list of pointers to rows, of type int **. The downside of this approach is that the array is no longer contiguous (which may affect cache performance) and it requires reading a pointer to find the location of a particular value, instead of just doing address arithmetic starting from the base address of the array. But elements can still be accessed using the a[i][j] syntax. An example of this approach is given in malloc2d.c.

34.3. Variable-length arrays

C99 adds the feature of variable-length arrays, where the size of the array is determined at run-time. These can only appear as local variables in procedures (automatic variables) or in argument lists. In the case of variable-length arrays in argument lists, it is also necessary that the length of the array be computable from previous arguments.

For example, we could make the length of the array explicit in our sumArray function:

   1 /* return the sum of the values in a, an array of size n */
   2 int
   3 sumArray(int n, const int a[n])
   4 {
   5     int i;
   6     int sum;
   7 
   8     sum = 0;
   9     for(i = 0; i < n; i++) {
  10         sum += a[i];
  11     }
  12 
  13     return sum;
  14 }

This doesn't accompish much, because the length of the array is not used. However, it does become useful if we have a two-dimensional array, as otherwise there is no way to compute the length of each row:

   1 int
   2 sumMatrix(int rows, int cols, const int m[rows][cols])
   3 {
   4     int i;
   5     int j;
   6     int sum;
   7 
   8     sum = 0;
   9     for(i = 0; i < rows; i++) {
  10         for(j = 0; j < cols; j++) {
  11             sum += a[i][j];
  12         }
  13     }
  14 
  15     return sum;
  16 }

Here the fact that each row of m is known to be an array of cols many ints makes the implicit pointer computation in a[i][j] actually work. It is considerably more difficult to to this in ANSI C; the simplest approach is to pack m into a one-dimensional array and do the address computation explicitly:

   1 int
   2 sumMatrix(int rows, int cols, const int a[])
   3 {
   4     int i;
   5     int j;
   6     int sum;
   7 
   8     sum = 0;
   9     for(i = 0; i < rows; i++) {
  10         for(j = 0; j < cols; j++) {
  11             sum += a[i*cols + j];
  12         }
  13     }
  14 
  15     return sum;
  16 }

Variable-length arrays can sometimes be used for run-time storage allocation, as an alternative to malloc and free (see below). A variable-length array allocated as a local variable will be deallocated when the containing scope (usually a function body, but maybe just a compound statement marked off by braces) exits. One consequence of this is that you can't return a variable-length array from a function.

Here is an example of code using this feature:

   1 /* reverse an array in place */
   2 void
   3 reverseArray(int n, int a[n])
   4 {
   5     /* algorithm: copy to a new array in reverse order */
   6     /* then copy back */
   7 
   8     int i;
   9     int copy[n];
  10 
  11     for(i = 0; i < n; i++) {
  12         /* the -1 is needed to that a[0] goes to a[n-1] etc. */
  13         copy[n-i-1] = a[i];
  14     }
  15 
  16     for(i = 0; i < n; i++) {
  17         a[i] = copy[i];
  18     }
  19 }

While using variable-length arrays for this purpose can simplify code in some cases, as a general programming practice it is extremely dangerous. The reason is that, unlike allocations through malloc, variable-length array allocations are typically allocated on the stack (which is often more constrainted than the heap) and have no way of reporting failure. So if there isn't enough room for your variable-length array, odds are you won't find out until a segmentation fault occurs somewhere later in your code when you try to use it.

(As an additional annoyance, gdb is confused by two-dimensional variable-length arrays.)

Here's a safer version of the above routine, using malloc and free.

   1 /* reverse an array in place */
   2 void
   3 reverseArray(int n, int a[n])
   4 {
   5     /* algorithm: copy to a new array in reverse order */
   6     /* then copy back */
   7 
   8     int i;
   9     int *copy;
  10 
  11     copy = (int *) malloc(n * sizeof(int));
  12     assert(copy);  /* or some other error check */
  13 
  14     for(i = 0; i < n; i++) {
  15         /* the -1 is needed to that a[0] goes to a[n-1] etc. */
  16         copy[n-i-1] = a[i];
  17     }
  18 
  19     for(i = 0; i < n; i++) {
  20         a[i] = copy[i];
  21     }
  22 
  23     free(copy);
  24 }

35. Void pointers

A special pointer type is void *, a "pointer to void". Such pointers are declared in the usual way:

   1     void *nothing;      /* pointer to nothing */

Unlike ordinary pointers, you can't derefence a void * pointer or do arithmetic on it, because the compiler doesn't know what type it points to. However, you are allowed to use a void * as a kind of "raw address" pointer value that you can store arbitrary pointers in. It is permitted to assign to a void * variable from an expression of any pointer type; conversely, a void * pointer value can be assigned to a pointer variable of any type.

If you need to use a void * pointer as a pointer of a particular type in an expression, you can cast it to the appropriate type by prefixing it with a type name in parentheses, like this:

   1     int a[50];          /* typical array of ints */
   2     void *p;            /* dangerous void pointer */
   3 
   4     a[12] = 17;         /* save that valuable 17 */
   5     p = a;              /* p now holds base address of a */
   6 
   7     printf("%d\n", ((int *) p)[12]);  /* get 17 back */

Usually if you have to start writing casts, it's a sign that you are doing something wrong, and you run the danger of violating the type system---say, by tricking the compiler into treating a block of bits that are supposed to be an int as four chars. But violating the type system like this will be necessary for some applications, because even the weak type system in C turns out to be too restrictive for writing certain kinds of "generic" code that work on values of arbitrary types.

36. Run-time storage allocation

C does not permit arrays to be declared with variable sizes. C also doesn't let local variables outlive the function they are declared in. Both features can be awkward if you want to build data structures at run time that have unpredictable (perhaps even changing) sizes and that are intended to persist longer than the functions that create them. To build such structures, the standard C library provides the malloc routine, which asks the operating system for a block of space of a given size (in bytes). With a bit of pushing and shoving, this can be used to obtain a block of space that for all practical purposes acts just like an array.

To use malloc, you must include stdlib.h at the top of your program. The declaration for malloc is

   1 void *malloc(size_t);

where size_t is an integer type (often unsigned long). Calling malloc with an argument of n allocates and returns a pointer to the start of a block of n bytes if possible. If the system can't give you the space you asked for (maybe you asked for more space than it has), malloc returns a null pointer. It is good practice to test the return value of malloc whenever you call it.

Because the return type of malloc is void *, its return value can be assigned to any variable with a pointer type. Computing the size of the block you need is your responsibility---and you will be punished for any mistakes with difficult-to-diagnose buffer overrun errors---but this task is made slightly easier by the built-in sizeof operator that allows you to compute the size in bytes of any particular data type. A typical call to malloc might thus look something like this:

   1 #include <stdlib.h>
   2 
   3 /* allocate and return a new integer array with n elements */
   4 /* calls abort() if there isn't enough space */
   5 int *
   6 makeIntArray(int n)
   7 {
   8     int *a;
   9 
  10     a = malloc(sizeof(int) * n);
  11 
  12     if(a == 0) abort();                 /* die on failure */
  13 
  14     return a;
  15 }

When you are done with a malloc'd region, you should return the space to the system using the free routine, also defined in stdlib.h. If you don't do this, your program will quickly run out of space. The free routine takes a void * as its argument and returns nothing. It is good practice to write a matching destructor that de-allocates an object for each constructor (like makeIntArray) that makes one.

   1 void
   2 destroyIntArray(int *a)
   3 {
   4     free(a);
   5 }

It is a serious error to do anything at all with a block after it has been freed.

It is also possible to grow or shrink a previously allocated block. This is done using the realloc function, which is declared as

   1 void *realloc(void *oldBlock, size_t newSize);

The realloc function returns a pointer to the resized block. It may or may not allocate a new block; if there is room, it may leave the old block in place and return its argument. But it may allocate a new block and copy the contents of the old block, so you should assume that the old pointer has been freed.

Here's a typical use of realloc to build an array that grows as large as it needs to be:

   1 /* read numbers from stdin until there aren't any more */
   2 /* returns an array of all numbers read, or null on error */
   3 /* returns the count of numbers read in *count */
   4 int *
   5 readNumbers(int *count /* RETVAL */)
   6 {
   7     int mycount;        /* number of numbers read */
   8     int size;           /* size of block allocated so far */
   9     int *a;             /* block */
  10     int n;              /* number read */
  11 
  12     mycount = 0;
  13     size = 1;
  14 
  15     a = malloc(sizeof(int) * size);     /* allocating zero bytes is tricky */
  16     if(a == 0) return 0;
  17 
  18     while(scanf("%d", &n) == 1) {
  19         /* is there room? */
  20         while(mycount >= size) {
  21             /* double the size to avoid calling realloc for every number read */
  22             size *= 2;
  23             a = realloc(a, sizeof(int) * size);
  24             if(a == 0) return 0;
  25         }
  26 
  27         /* put the new number in */
  28         a[mycount++] = n;
  29     }
  30 
  31     /* now trim off any excess space */
  32     a = realloc(a, sizeof(int) * mycount);
  33     /* note: if a == 0 at this point we'll just return it anyway */
  34 
  35     /* save out mycount */
  36     *count = mycount;
  37 
  38     return a;
  39 }

Because errors involving malloc and its friends can be very difficult to spot, it is recommended to test any program that uses malloc using valgrind if possible. (See C/valgrind).

(See also C/DynamicStorageAllocation for some old notes on this subject.)

37. The restrict keyword

In C99, it is possible to declare that a pointer variable is the only way to reach its target as long as it is in scope. This is not enforced by the compiler; instead, it is a promise from the programmer to the compiler that any data reached through this point will not be changed by other parts of the code, which allows the compiler to optimize code in ways that are not possible if pointers might point to the same place (a phenomenon called pointer aliasing). For example, in the short routine:

   1 // write 1 + *src to *dst and return *src
   2 int
   3 copyPlusOne(int * restrict dst, int * restrict src)
   4 {
   5     *dst = *src + 1;
   6     return *src;
   7 }

the output of gcc -std=c99 -O3 -S includes one more instruction if the restrict qualifiers are removed. The reason is that if dst and src may point to the same location, src needs to be re-read for the return statement, in case it changed, but if they don't, the compiler can re-use the previous value it already has in one of the CPU registers.

For most code, this feature is useless, and potentially dangerous if someone calls your routine with aliased pointers. However, it may sometimes be possible to increase performance of time-critical code by adding a restrict keyword. The cost is that the code might no longer work if called with aliased pointers.


CategoryProgrammingNotes

38. C/DynamicStorageAllocation

WARNING: These are old, buggy notes from many years ago, badly converted to wiki format. Even the code examples are broken in various ways. For more up-to-date notes on the subject, see C/Pointers.

39. Dynamic storage allocation in C

40. Overview

The C programming language does not provide a built-in mechanism for allocating space at run-time. Instead, the standard library contains three functions for managing a region of memory called the heap. The implementation of these functions is similar to the implementation of the alloc function in Section 5.3 of K&R. They are declared in stdlib.h, so you will need to #include that header file before using them.

41. Allocating space with malloc

The function for allocating space is malloc. Its declaration is:

   1 void *malloc(size_t size);

The return type, (void *), is the generic pointer type in C; it can be cast freely to and from other pointer types, and is intended to represent an address unadorned with any knowledge of what is stored at that address. The argument is of type size_t (which is typically an alias for unsigned long declared using typedef), and indicates how many bytes of storage you want to grab. You can find out how many bytes you need using sizeof; for example, if you want to allocate an array of n values of type int, you could write:

   1 int *a;
   2 
   3 a = malloc(sizeof(int) * n);

After executing the malloc operation, you can treat a as an ordinary array of ints, just as if you had declared it with int a[10];.

41.1. Testing the return value of malloc

It is generally a good idea to test the return value of malloc, since it might return a null pointer if it can't get you the space you asked for. Here is a typical test:

   1 a = malloc(sizeof(*a) * SOME_HUGE_VALUE);
   2 if(a == 0) {
   3 /* respond to the disastrous error gracefully */
   4 fprintf(stderr, "AAAAAARGGHGGHHGH!\n");
   5 abort();
   6 }

42. Freeing space with free

The free function has the opposite purpose of malloc; given something you've previous allocated, you can call free to give it up.

The declaration for free looks like this:

   1 void free(void *);

And a typical use might be:

   1 int *a;
   2 
   3 a = malloc(sizeof(*a) * n);
   4 /* use a for a while */
   5 free(a);

42.1. Storage allocation discipline

It is very important to free everything you allocate when you are done with it. Failure to do this causes storage leaks, bugs in which a program slowly consumes more and more memory over time until it can't allocate any more. Whenever you allocate something, you should have a plan for how to get rid of it. Some typical cases: 1.A block is allocated within some function and used only inside that function. In this case it should be freed before the function returns. 1.A block is allocated inside a function and then returned to the function's caller. The caller is now responsible for freeing the block (and you should put a comment on the function to this effect). If a function constructs some complicated data structure that can't simply be fed to free, it is polite to provide a second destructor function that tears down the data structure and frees all of its internal pieces. The same discipline that applies to malloc and free applies to constructors and destructors; if you make something by calling a constructor function you are responsible for making sure that the destructor is called when you are done with it.

Though it is not a problem to have un-freed blocks when a program exits, since the operating system will recover all space the program used at this point, good programming style dictates freeing your storage on the way out. This will keep you disciplined about balancing calls to malloc and free, and becomes especially important if somebody takes your program and rewrites it as a subroutine in a larger program. When using constructors and destructors, calling the destructor also takes care of any additional housekeeping the destructor might do.

42.2. Perils of buffer overflows

Writing off the end of a block allocated with malloc is particularly dangerous, since it is likely to overwrite internal state used by the storage allocator. Such bugs can be very hard to find, since the symptom of such a bug may be that malloc or free mysteriously segfaults on some completely unrelated call much later in the execution of the program.

One particular trap of this sort that is easy to fall into is to forget to allocate an extra byte to store the '\0' character at the end of the string. Here is an example of the right way to do this: {{{/* returns a malloc'd copy of its argument, or 0 if malloc fails */ char * my_strdup(const char *s) { char *s2;

s2 = malloc(strlen(s)+1); if(s2 == 0) return 0; strcpy(s2, s); return s2; } }}}

42.3. The lingering curse of referencing freed blocks

A common error is to free a block and then reference it. Don't do this; you don't know if malloc has already given out the storage to somebody else, or if free scribbled over part of the block as part of its internal operations. If you need information from a block that you are about to free, copy it somewhere else first.

43. Dynamic arrays using realloc

Some languages provide a dynamic array type whose size can be changed after the array is created. C does not, but a similar effect can be obtained using the realloc function. This function takes a block that was previously obtained from malloc and changes its size. If there is not enough room to keep the block where it is, realloc makes a new copy of the block and frees the original. In either case it returns a pointer to the new location. Like malloc, realloc returns 0 if it fails; however, in this case the original block is not changed or freed.

An example of realloc in action:

   1 int *a;
   2 
   3 a = malloc(sizeof(*a) * 10);        /* now a holds 10 ints */
   4 a = realloc(a, sizeof(*a) * 20);    /* now a holds 20 ints */
   5 free(a);                            /* now a is gone! */

Note that calling realloc a lot can get pretty expensive, since it may need to copy the entire contents of a possibly very large block every time you call it. If you know that you are going to be growing a block many times, one trick is to keep track of its actual size separately from the size of the part of it that is used, and double the actual size only when needed. In this case the amortized cost of the reallocations is just a constant amount for each new element.

44. Example of building a data structure with malloc

Here is an example of a program that includes a constructor and destructor function for a two-dimensional array data structure (implemented as an array of pointers to pointers) that requires many calls to malloc to construct:

44.0.1. malloc2d.c

   1 /* Demo program for malloc'd two-dimensional arrays */
   2 
   3 #include <stdio.h>
   4 #include <stdlib.h>
   5 
   6 /* returns a two-dimensional array with num_rows rows and 
   7 * rowsize bytes per row, or 0 on allocation failure.
   8 * The caller is responsible for freeing the result with free2d. */
   9 void **
  10 malloc2d (size_t num_rows, size_t rowsize)
  11 {
  12   void **a;
  13   size_t i;
  14 
  15 /* a is an array of void * pointers that point to the rows */
  16 /* The last element is 0, so free2d can detect the last row */
  17   a = malloc (sizeof (void *) * (num_rows + 1));        /* one extra for sentinel */
  18   if (a == 0)
  19     return 0;
  20 
  21 /* now allocate the actual rows */
  22   for (i = 0; i < num_rows; i++)
  23     {
  24       a[i] = malloc (rowsize);
  25       if (a[i] == 0)
  26         {
  27 /* if we were being really careful, we'd free previous
  28 * rows and a here. */
  29           return 0;
  30         }
  31     }
  32 
  33 /* initialize the sentinel value */
  34   a[num_rows] = 0;
  35 
  36   return a;
  37 }
  38 
  39 /* frees a 2d array created by malloc2d */
  40 void
  41 free2d (void **a)
  42 {
  43   void **row;
  44 
  45 /* notice how we free everything malloc'd in malloc2d in reverse order */
  46   for (row = a; *row != 0; row++)
  47     {
  48       free (*row);
  49     }
  50   free (a);
  51 }
  52 
  53 int
  54 main (int argc, char **argv)
  55 {
  56   int rows;
  57   int cols;
  58   int **a;
  59   int i;
  60   int j;
  61 
  62   if (argc != 3)
  63     {
  64       fprintf (stderr, "Usage: %s rows cols\n", argv[0]);
  65       return 1;
  66     }
  67 /* else */
  68 
  69   rows = atoi (argv[1]);
  70   cols = atoi (argv[2]);
  71 
  72   a = malloc2d (rows, cols * sizeof (int));
  73   if (a == 0)
  74     {
  75       fprintf (stderr, "malloc2d failed, exiting\n");
  76       return 2;
  77     }
  78 
  79   for (i = 0; i < rows; i++)
  80     {
  81       for (j = 0; j < cols; j++)
  82         {
  83           a[i][j] = i - j;
  84         }
  85     }
  86 
  87   for (i = 0; i < rows; i++)
  88     {
  89       for (j = 0; j < cols; j++)
  90         {
  91           printf ("%4d", a[i][j]);
  92         }
  93       putchar ('\n');
  94     }
  95 
  96   free2d (a);                   /* always clean up */
  97 
  98   return 0;
  99 }

44.0.2. Makefile

CC= gcc
CFLAGS=-g3 -ansi -pedantic -Wall

all:

test: malloc2d
        ./malloc2d 4 6
        @echo OK!

malloc2d: malloc2d.o
        $(CC) $(CFLAGS) -o $@ $^

clean:
        rm -f *.o malloc2d

44.0.3. Output of make test

{{{$ make test gcc -g3 -ansi -pedantic -Wall -c -o malloc2d.o malloc2d.c malloc2d.c: In function `main': malloc2d.c:68: warning: assignment from incompatible pointer type malloc2d.c:87: warning: passing arg 1 of `free2d' from incompatible pointer type gcc -g3 -ansi -pedantic -Wall -o malloc2d malloc2d.o ./malloc2d 4 6 0 -1 -2 -3 -4 -5 1 0 -1 -2 -3 -4 2 1 0 -1 -2 -3 3 2 1 0 -1 -2 OK! }}}


CategoryProgrammingNotes

45. C/Strings

46. String processing in general

Processing strings of characters is one of the oldest application of mechanical computers, arguably predating numerical computation by at least fifty years. Assuming you've already solved the problem of how to represent characters in memory (e.g. as the C char type encoded in ASCII), there are two standard ways to represent strings:

  • As a delimited string, where the end of a string is marked by a special character. The advantages of this method are that only one extra byte is needed to indicate the length of an arbitrarily long string, that strings can be manipulated by simple pointer operations, and in some cases that common string operations that involve processing the entire string can be performed very quickly. The disadvantage is that the delimiter can't appear inside any string, which limits what kind of data you can store in a string.

  • As a counted string, where the string data is prefixed or supplemented with an explicit count of the number of characters in the string. The advantage of this representation is that a string can hold arbitrary data (including delimiter characters) and that one can quickly jump to the end of the string without having to scan its entire length. The disadvantage is that maintaining a separate count typically requires more space than adding a one-byte delimiter (unless you limit your string length to 255 characters) and that more care needs to be taken to make sure that the count is correct.

47. C strings

Because delimited strings are more lightweight, C went for delimited strings. A string is a sequence of characters terminated by a null character '\0'. Note that the null character is not the same as a null pointer, although both appear to have the value 0 when used in integer contexts. A string is represented by a variable of type char *, which points to the zeroth character of the string. The programmer is responsible for allocating and managing space to store strings, except for explicit string constants, which are stored in a special non-writable string space by the compiler.

If you want to use counted strings instead, you can build your own using a struct (see C/Structs). Most scripting languages written in C (e.g. Perl, Python_programming_language, PHP, etc.) use this approach internally. (Tcl is an exception, which is one of many good reasons not to use Tcl).

48. String constants

A string constant in C is represented by a sequence of characters within double quotes. Standard C character escape sequences like \n (newline), \r (carriage return), \a (bell), \0x17 (character with hexadecimal code 0x17), \\ (backslash), and \" (double quote) can all be used inside string constants. The value of a string constant has type const char *, and can be assigned to variables and passed as function arguments or return values of this type.

Two string constants separated only by whitespace will be concatenated by the compiler as a single constant: "foo" "bar" is the same as "foobar". This feature is not used in normal code much, but shows up sometimes in macros (see C/Macros).

49. String buffers

The problem with string constants is that you can't modify them. If you want to build strings on the fly, you will need to allocate space for them. The traditional approach is to use a buffer, an array of chars. Here is a particularly painful hello-world program that builds a string by hand:

   1 #include <stdio.h>
   2 
   3 int
   4 main(int argc, char **argv)
   5 {
   6     char hi[3];
   7 
   8     hi[0] = 'h';
   9     hi[1] = 'i';
  10     hi[2] = '\0';
  11 
  12     puts(hi);
  13 
  14     return 0;
  15 }
hi.c

Note that the buffer needs to have size at least 3 in order to hold all three characters. A common error in programming with C strings is to forget to leave space for the null at the end (or to forget to add the null, which can have comical results depending on what you are using your surprisingly long string for).

50. Operations on strings

Unlike many programming languages, C provides only a rudimentary string-processing library. The reason is that many common string-processing tasks in C can be done very quickly by hand.

For example, suppose we want to copy a string from one buffer to another. The library function strcpy declared in string.h will do this for us (and is usually the right thing to use), but if it didn't exist we could write something very close to it using a famous C idiom.

   1 void
   2 strcpy2(char *dest, const char *src)
   3 {
   4     /* This line copies characters one at a time from *src to *dest. */
   5     /* The postincrements increment the pointers (++ binds tighter than *) */
   6     /*  to get to the next locations on the next iteration through the loop. */
   7     /* The loop terminates when *src == '\0' == 0. */
   8     /* There is no loop body because there is nothing to do there. */
   9     while(*dest++ = *src++);
  10 }

The externally visible difference between strcpy2 and the original strcpy is that strcpy returns a char * equal to its first argument. It is also likely that any implementation of strcpy found in a recent C library takes advantage of the width of the memory data path to copy more than one character at a time.

Most C programmers will recognize the while(*dest++ = *src++); from having seen it before, although experienced C programmers will generally be able to figure out what such highly abbreviated constructions mean. Exposure to such constructions is arguably a form of hazing.

Because C pointers act exactly like array names, you can also write strcpy2 using explicit array indices. The result is longer but may be more readable if you aren't a C fanatic.

   1 char *
   2 strcpy2a(char *dest, const char *src)
   3 {
   4     int ;
   5 
   6     i = 0;
   7     for(i = 0; src[i] != '\0'; i++) {
   8         dest[i] = src[i];
   9     }
  10 
  11     /* note that the final null in src is not copied by the loop */
  12     dest[i] = '\0';
  13 
  14     return dest;
  15 }

An advantage of using a separate index in strcpy2a is that we don't trash dest, so we can return it just like strcpy does. (In fairness, strcpy2 could have saved a copy of the original location of dest and done the same thing.)

Note that nothing in strcpy2, strcpy2a, or the original strcpy will save you if dest points to a region of memory that isn't big enough to hold the string at src, or if somebody forget to tack a null on the end of src (in which case strcpy will just keep going until it finds a null character somewhere). As elsewhere, it's your job as a programming to make sure there is enough room. Since the compiler has no idea what dest points to, this means that you have to remember how much room is available there yourself.

If you are worried about overrunning dest, you could use strncpy instead. The strncpy function takes a third argument that gives the maximum number of characters to copy; however, if src doesn't contain a null character in this range, the resulting string in dest won't either. Usually the only practical application to strncpy is to extract the first k characters of a string, as in

   1 /* copy the substring of src consisting of characters at positions
   2     start..end-1 (inclusive) into dest */
   3 /* If end-1 is past the end of src, copies only as many characters as 
   4     available. */
   5 /* If start is past the end of src, the results are unpredictable. */
   6 /* Returns a pointer to dest */
   7 char *
   8 copySubstring(char *dest, const char *src, int start, int end)
   9 {
  10     /* copy the substring */
  11     strncpy(dest, src + start, end - start);
  12 
  13     /* add null since strncpy probably didn't */
  14     dest[end - start] = '\0';
  15 
  16     return dest;
  17 }

Another quick and dirty way to extract a substring of a string you don't care about (and can write to) is to just drop a null character in the middle of the sacrificial string. This is generally a bad idea unless you are certain you aren't going to need the original string again, but it's a surprisingly common practice among C programmers of a certain age.

A similar operation to strcpy is strcat. The difference is that strcat concatenates src on to the end of dest; so that if dest previous pointed to "abc" and src to "def", dest will now point to "abcdef". Like strcpy, strcat returns its first argument. A no-return-value version of strcat is given below.

   1 void
   2 strcat2(char *dest, const char *src)
   3 {
   4     while(*dest) dest++;
   5     while(*dest++ = *src++);
   6 }

Decoding this abomination is left as an exercise for the reader. There is also a function strncat which has the same relationship to strcat that strncpy has to strcpy.

As with strcpy, the actual implementation of strcat may be much more subtle, and is likely to be faster than rolling your own.

51. Finding the length of a string

Because the length of a string is of fundamental importance in C (e.g., when deciding if you can safely copy it somewhere else), the standard C library provides a function strlen that counts the number of non-null characters in a string. Here's a possible implementation:

   1 int
   2 strlen(const char *s)
   3 {
   4     int i;
   5 
   6     for(i = 0; *s; i++, s++);
   7 
   8     return i;
   9 }

Note the use of the comma operator in the increment step. The comma operator applied to two expressions evaluates both of them and discards the value of the first; it is usually used only in for loops where you want to initialize or advance more than one variable at once.

Like the other string routines, using strlen requires including string.h.

51.1. The strlen tarpit

A common mistake is to put a call to strlen in the header of a loop; for example:

   1 /* like strcpy, but only copies characters at indices 0, 2, 4, ...
   2    from src to dest */
   3 char *
   4 copyEvenCharactersBadVersion(char *dest, const char *src)
   5 {
   6     int i;
   7     int j;
   8 
   9     /* BAD: Calls strlen on every pass through the loop */
  10     for(i = 0, j = 0; i < strlen(src); i += 2, j++) {
  11         dest[j] = src[i];
  12     }
  13 
  14     dest[j] = '\0';
  15 
  16     return dest;
  17 }

The problem is that strlen has to scan all of src every time the test is done, which adds time proportional to the length of src to each iteration of the loop. So copyEvenCharactersBadVersion takes time proportional to the square of the length of src.

Here's a faster version:

   1 /* like strcpy, but only copies characters at indices 0, 2, 4, ...
   2    from src to dest */
   3 char *
   4 copyEvenCharacters(char *dest, const char *src)
   5 {
   6     int i;
   7     int j;
   8     int len;    /* length of src */
   9 
  10     len = strlen(src);
  11 
  12     /* GOOD: uses cached value of strlen(src) */
  13     for(i = 0, j = 0; i < len; i += 2, j++) {
  14         dest[j] = src[i];
  15     }
  16 
  17     dest[j] = '\0';
  18 
  19     return dest;
  20 }

Because it doesn't call strlen all the time, this version of copyEvenCharacters will run much faster than the original even on small strings, and several million times faster if src is a megabyte long.

52. Comparing strings

If you want to test if strings s1 and s2 contain the same characters, writing s1 == s2 won't work, since this tests instead whether s1 and s2 point to the same address. Instead, you should use strcmp, declared in string.h. The strcmp function walks along both of its arguments until it either hits a null on both and returns 0, or hits two different characters, and returns a positive integer if the first string's character is bigger and a negative integer if the second string's character is bigger (a typical implementation will just subtract the two characters). A possible but slow implementation might look like this:

   1 int
   2 strcmp(const char *s1, const char *s2)
   3 {
   4     while(*s1 && *s2 && *s1 == *s2) {
   5         s1++;
   6         s2++;
   7     }
   8 
   9     return *s1 - *s2;
  10 }

(The reason this implementation is slow on modern hardware is that it only compares the strings one character at a time; it is almost always faster to compare four characters at once on a 32-bit architecture, although doing so requires no end of trickiness to detect the end of the strings. It is also likely that whatever C library you are using contains even faster hand-coded assembly language versions of strcmp and the other string routines for most of the CPU architectures you are likely to use. Under some circumstances, the compiler when running with the optimizer turned on may even omit a function call entirely and just patch the appropriate assembly-language code directly into whatever routine calls strcmp, strlen etc. As a programmer, you should not be able to detect that any of these optimizations are happening, but they are another reason to use standard C language or library features when you can.)

To use strcmp to test equality, test if the return value is 0:

   1     if(strcmp(s1, s2) == 0) {
   2         /* strings are equal */
   3         ...
   4     }

You may sometimes see this idiom instead:

   1     if(!strcmp(s1, s2)) {
   2         /* strings are equal */
   3         ...
   4     }

My own feeling is that the first version is more clear, since !strcmp always suggested to me that you were testing for not some property (e.g. not equal). But if you think of strcmp as telling you when two strings are different rather than when they are equal, this may not be so confusing.

53. Formatted output to strings

You can write formatted output to a string buffer with sprintf just like you can write it to stdout with printf or to a file with fprintf. Make sure when you do so that there is enough room in the buffer you are writing to, or the usual bad things will happen.

54. Dynamic allocation of strings

When allocating space for a copy of a string s using malloc, the required space is strlen(s)+1. Don't forget the +1, or bad things may happen.5 Because allocating space for a copy of a string is such a common operation, many C libraries provide a strdup function that does exactly this. If you don't have one (it's not required by the C standard), you can write your own like this:

   1 /* return a freshly-malloc'd copy of s */
   2 /* or 0 if malloc fails */
   3 /* It is the caller's responsibility to free the returned string when done. */
   4 char *
   5 strdup(const char *s)
   6 {
   7     char *s2;
   8 
   9     s2 = malloc(strlen(s)+1);
  10 
  11     if(s2 != 0) {
  12         strcpy(s2, s);
  13     }
  14 
  15     return s2;
  16 }

Exercise: Write a function strcat_alloc that returns a freshly-malloc'd string that concatenates its two arguments. Exactly how many bytes do you need to allocate?

55. argc and argv

Now that we know about strings, we can finally do something with argv. Recall that argv in main is declared as char **; this means that it is a pointer to a pointer to a char, or in this case the base address of an array of pointers to char, where each such pointer references a string. These strings correspond to the command-line arguments to your program, with the program name itself appearing in argv[0]6 The count argc counts all arguments including argv[0]; it is 1 if your program is called with no arguments and larger otherwise.

Here is a program that prints its arguments. If you get confused about what argc and argv do, feel free to compile this and play with it:

   1 #include <stdio.h>
   2 
   3 int
   4 main(int argc, char **argv)
   5 {
   6     int i;
   7 
   8     printf("argc = %d\n\n", argc);
   9 
  10     for(i = 0; i < argc; i++) {
  11 	printf("argv[%d] = %s\n", i, argv[i]);
  12     }
  13 
  14     return 0;
  15 }
print_args.c

Like strings, C terminates argv with a null: the value of argv[argc] is always 0 (a null pointer to char). In principle this allows you to recover argc if you lose it.


CategoryProgrammingNotes

56. C/Structs

57. Structs

A struct is a way to define a type that consists of one or more other types pasted together. Here's a typical struct definition:

   1 struct string {
   2     int length;
   3     char *data;
   4 };

This defines a new type struct string that can be used anywhere you would use a simple type like int or float. When you declare a variable with type struct string, the compiler allocates enough space to hold both an int and a char * (8 bytes on a typical 32-bit machine). You can get at the individual components using the . operator, like this:

   1 struct string {
   2     int length;
   3     char *data;
   4 };
   5 
   6 int
   7 main(int argc, char **argv)
   8 {
   9     struct string s;
  10 
  11     s.length = 4;
  12     s.data = "this string is a lot longer than you think";
  13 
  14     puts(s.data);
  15 
  16     return 0;
  17 }
struct_example.c

Variables of type struct can be assigned to, passed into functions, returned from functions, and tested for equality, just like any other type. Each such operation is applied componentwise; for example, s1 = s2; is equivalent to s1.length = s2.length; s1.data = s2.data; and s1 == s2 is equivalent to s1.length == s2.length && s1.data = s2.data.

These operations are not used as often as you might think: typically, instead of copying around entire structures, C programs pass around pointers, as is done with arrays. Pointers to structs are common enough in C that a special syntax is provided for dereferencing them.7 Suppose we have:

   1     struct string s;            /* a struct */
   2     struct string *sp;          /* a pointer to a struct */
   3 
   4     s.length = 4;
   5     s.data = "another overly long string";
   6 
   7     sp = &s;

`

We can then refer to elements of the struct string that sp points to (i.e. s) in either of two ways:

   1     puts((*sp).data);
   2     puts(sp->data);

The second is more common, since it involves typing fewer parentheses. It is an error to write *sp.data in this case; since . binds tighter than *, the compiler will attempt to evaluate sp.data first and generate an error, since sp doesn't have a data field.

Pointers to structs are commonly used in defining AbstractDataTypes, since it is possible to declare that a function returns e.g. a struct string * without specifying the components of a struct string. (All pointers to structs in C have the same size and structure, so the compiler doesn't need to know the components to pass around the address.) Hiding the components discourages code that shouldn't look at them from doing so, and can be used, for example, to enforce consistency between fields.

For example, suppose we wanted to define a struct string * type that held counted strings that could only be accessed through a restricted interface that prevented (for example) the user from changing the string or its length. We might create a file myString.h that contained the declarations:

   1 /* make a struct string * that holds a copy of s */
   2 struct string *makeString(const char *s);
   3 
   4 /* destroy a struct string * */
   5 void destroyString(struct string *);
   6 
   7 /* return the length of a struct string * */
   8 int stringLength(struct string *);
   9 
  10 /* return the character at position index in the struct string * */
  11 /* or returns -1 if index is out of bounds */
  12 int stringCharAt(struct string *s, int index);
myString.h

and then the actual implementation in myString.c would be the only place where the components of a struct string were defined:

   1 #include <stdlib.h>
   2 #include <string.h>
   3 
   4 #include "myString.h"
   5 
   6 struct string {
   7     int length;
   8     char *data;
   9 };
  10 
  11 struct string *
  12 makeString(const char *s)
  13 {
  14     struct string *s2;
  15 
  16     s2 = malloc(sizeof(struct string));
  17     if(s2 == 0) return 0;
  18 
  19     s2->length = strlen(s);
  20 
  21     s2->data = malloc(s2->length);
  22     if(s2->data == 0) {
  23 	free(s2);
  24 	return 0;
  25     }
  26 
  27     strncpy(s2->data, s, s2->length);
  28 
  29     return s2;
  30 }
  31 
  32 void
  33 destroyString(struct string *s)
  34 {
  35     free(s->data);
  36     free(s);
  37 }
  38 
  39 int
  40 stringLength(struct string *s)
  41 {
  42     return s->length;
  43 }
  44 
  45 int
  46 stringCharAt(struct string *s, int index)
  47 {
  48     if(index < 0 || index >= s->length) {
  49 	return -1;
  50     } else {
  51 	return s->data[index];
  52     }
  53 }
myString.c

In practice, we would probably go even further and replace all the struct string * types with a new name declared with typedef.

58. Unions

A union is just like a struct, except that instead of allocating space to store all the components, the compiler only allocates space to store the largest one, and makes all the components refer to the same address. This can be used to save space if you know that only one of several components will be meaningful for a particular object. An example might be a type representing an object in a LISP-like language like Scheme:

   1 struct lispObject {
   2     int type;           /* type code */
   3     union {
   4         int     intVal;
   5         double  floatVal;
   6         char *  stringVal;
   7         struct {
   8             struct lispObject *car;
   9             struct lispObject *cdr;
  10         } consVal;
  11     } u;
  12 };

Now if you wanted to make a struct lispObject that held an integer value, you might write

   1     lispObject o;
   2 
   3     o.type = TYPE_INT;
   4     o.u.intVal = 27;

where TYPE_INT had presumably been defined somewhere. Note that nothing then prevents you from writing

   1     x = 2.7 * o.u.floatVal;

but the effects will be strange, since it's likely that the bit pattern representing 27 as an int represents something very different as a double. Avoiding such mistakes is your responsibility, which is why most uses of union occur inside larger structs that contain enough information to figure out which variant of the union applies.

59. Bit fields

It is possible to specify the exact number of bits taken up by a member of a struct of integer type. This is seldom useful, but may in principle let you pack more information in less space, e.g.:

   1 struct color {
   2     unsigned int red   : 2;
   3     unsigned int green : 2;
   4     unsigned int blue  : 2;
   5     unsigned int alpha : 2;
   6 };

defines a struct that (probably) occupies only one byte, and supplies four 2-bit fields, each of which can hold values in the range 0-3.


CategoryProgrammingNotes

60. AbstractDataTypes

61. Abstraction

One of the hard parts about computer programming is that, in general, programs are bigger than brains. Unless you have an unusally capacious brain, it is unlikely that you will be able to understand even a modestly large program in its entirety. So in order to be able to write and debug large programs, it is important to be able to break it up into pieces, where each piece can be treated as a tool whose use and description is simpler (and therefor fits in your brain better) than its actual code. Then you can forget about what is happening inside that piece, and just treat it as an easily-understood black box from the outside.

This process of wrapping functionality up in a box and forgetting about its internals is called abstraction, and it is the single most important concept in computer science. In these notes we will describe a particular kind of abstraction, the construction of abstract data types or ADTs. Abstract data types are data types whose implementation is not visible to their user; from the outside, all the user knows about an ADT is what operations can be performed on it and what those operations are supposed to do.

ADTs have an outside and an inside. The outside is called the interface; it consists of the minimal set of type and function declarations needed to use the ADT. The inside is called the implementation; it consists of type and function definitions, and sometime auxiliary data or helper functions, that are not visible to users of the ADT.

62. Example of an abstract data type

Too much abstraction at once can be hard to take, so let's look at a concrete example of an abstract data type. This ADT will represent an infinite sequence of ints. Each instance of the Sequence type supports a single operation seq_next that returns the next int in the sequence. We will also need to provide one or more constructor functions to generate new Sequences, and a destructor function to tear them down.

Here is an example of a typical use of a Sequence:

   1 void
   2 seq_print(Sequence s, int limit)
   3 {
   4     int i;
   5 
   6     for(i = seq_next(s); i < limit; i = seq_next(s)) {
   7         printf("%d\n", i);
   8     }
   9 }

Note that seq_print doesn't need to know anything at all about what a Sequence is or how seq_next works in order to print out all the values in the sequence until it hits one greater than or equal to limit. This is a good thing--- it means that we can use with any implementation of Sequence we like, and we don't have to change it if Sequence or seq_next changes.

62.1. Interface

In C, the interface of an abstract data type will usually be declared in a header file, which is included both in the file that implements the ADT (so that the compiler can check that the declarations match up with the actual definitions in the implementation. Here's a header file for sequences:

62.1.1. sequence.h

   1 /* opaque struct: hides actual components of struct sequence,
   2  * which are defined in sequence.c */
   3 typedef struct sequence *Sequence;
   4 
   5 /* constructors */
   6 /* all our constructors return a null pointer on allocation failure */
   7 
   8 /* returns a Sequence representing init, init+1, init+2, ... */
   9 Sequence seq_create(int init);
  10 
  11 /* returns a Sequence representing init, init+step, init+2*step, ... */
  12 Sequence seq_create_step(int init, int step);
  13 
  14 /* destructor */
  15 /* destroys a Sequence, recovering all interally-allocated data */
  16 void seq_destroy(Sequence);
  17 
  18 /* accessor */
  19 /* returns the first element in a sequence not previously returned */
  20 int seq_next(Sequence);
sequence.h

Here we have defined two different constructors for Sequences, one of which gives slightly more control over the sequence than the other. If we were willing to put more work into the implementation, we could imagine building a very complicated Sequence type that supported a much wider variety of sequences (for example, sequences generated by functions or sequences read from files); but we'll try to keep things simple for now. We can always add more functionality later, since the users won't notice if the Sequence type changes internally.

62.2. Implementation

The implementation of an ADT in C is typically contained in one (or sometimes more than one) .c file. This file can be compiled and linked into any program that needs to use the ADT. Here is our implementation of Sequence:

62.2.1. sequence.c

   1 #include <stdlib.h>
   2 
   3 #include "sequence.h"
   4 
   5 struct sequence {
   6     int next;   /* next value to return */
   7     int step;   /* how much to increment next by */
   8 };
   9 
  10 Sequence
  11 seq_create(int init)
  12 {
  13     return seq_create_step(init, 1);
  14 }
  15 
  16 Sequence
  17 seq_create_step(int init, int step)
  18 {
  19     Sequence s;
  20 
  21     s = malloc(sizeof(*s));
  22     if(s == 0) return 0;
  23     s->next = init;
  24     s->step = step;
  25     return s;
  26 }
  27 
  28 void
  29 seq_destroy(Sequence s)
  30 {
  31     free(s);
  32 }
  33 
  34 int
  35 seq_next(Sequence s)
  36 {
  37     int ret;            /* saves the old value before we increment it */
  38 
  39     ret = s->next;
  40     s->next += s->step;
  41 
  42     return ret;
  43 }
sequence.c

Things to note here: the definition of struct sequence appears only in this file; this means that only the functions defined here can (easily) access the next and step components. This protects Sequences to a limited extent from outside interference, and defends against users who might try to "violate the abstraction boundary" by examining the components of a Sequence directly. It also means that if we change the components or meaning of the components in struct sequence, we only have to fix the functions defined in sequence.c.

62.3. Compiling and linking

Now that we have sequence.h and sequence.c, how do we use them? Let's suppose we have a simple main program:

62.3.1. main.c

   1 #include <stdio.h>
   2 
   3 #include "sequence.h"
   4 
   5 
   6 void
   7 seq_print(Sequence s, int limit)
   8 {
   9     int i;
  10 
  11     for(i = seq_next(s); i < limit; i = seq_next(s)) {
  12         printf("%d\n", i);
  13     }
  14 }
  15 
  16 
  17 int
  18 main(int argc, char **argv)
  19 {
  20     Sequence s;
  21     Sequence s2;
  22 
  23     puts("Stepping by 1:");
  24 
  25     s = seq_create(0);
  26     seq_print(s, 5);
  27     seq_destroy(s);
  28 
  29     puts("Now stepping by 3:");
  30 
  31     s2 = seq_create_step(1, 3);
  32     seq_print(s2, 20);
  33     seq_destroy(s2);
  34 
  35     return 0;
  36 }
main.c

We can compile main.c and sequence.c together into a single binary with the command gcc main.c sequence.c. Or we can build a Makefile which will compile the two files separately and then link them. Using make may be more efficient, especially for large programs consisting of many components, since if we make any changes make will only recompile those files we have changed. So here is our Makefile:

62.3.2. Makefile

   1 CC=gcc
   2 CFLAGS=-g3 -ansi -pedantic -Wall
   3 
   4 all: seqprinter
   5 
   6 seqprinter: main.o sequence.o
   7         $(CC) $(CFLAGS) -o $@ $^
   8 
   9 test: seqprinter
  10         ./seqprinter
  11 
  12 # these rules say to rebuild main.o and sequence.o if sequence.h changes
  13 main.o: main.c sequence.h
  14 sequence.o: sequence.c sequence.h
  15 
  16 clean:
  17         $(RM) -f seqprinter *.o

Makefile

And now running make test produces this output. Notice how the built-in make variables $@ and $^ expand out to the left-hand side and right-hand side of the dependency line for building seqprinter.

$ make test
gcc -g3 -ansi -pedantic -Wall   -c -o main.o main.c
gcc -g3 -ansi -pedantic -Wall   -c -o sequence.o sequence.c
gcc -g3 -ansi -pedantic -Wall -o seqprinter main.o sequence.o
./seqprinter
Stepping by 1:
0
1
2
3
4
Now stepping by 3:
1
4
7
10
13
16
19

63. Designing abstract data types

Now we've seen how to implement an abstract data type. How do we choose when to use when, and what operations to give it? Let's try answering the second question first.

63.1. Parnas's Principle

Parnas's Principle is a statement of the fundamental idea of information hiding, which says that abstraction boundaries should be as narrow as possible:

  • The developer of a software component must provide the intended user with all the information needed to make effective use of the services provided by the component, and should provide no other information.
  • The developer of a software component must be provided with all the information necessary to carry out the given responsibilities assigned to the component, and should be provided with no other information.

(David Parnas, "On the Criteria to Be Used in Decomposing Systems into Modules," Communications of the ACM, 15(12): 1059--1062, 1972.)

For ADTs, this means we should provide as few functions for accessing and modifying the ADT as we can get away with. The Sequence type we defined early has a particularly narrow interface; the developer of Sequence (whoever is writing sequence.c) needs to know nothing about what its user wants except for the arguments passed in to seq_create or seq_create_step, and the user only needs to be able to call seq_next. More complicated ADTs might provide larger sets of operations, but in general we know that an ADT provides a successful abstraction when the operations are all "natural" ones given our high-level description. If we find ourselves writing a lot of extra operations to let users tinker with the guts of our implementation, that may be a sign that either we aren't taking our abstraction barrier seriously enough, or that we need to put the abstraction barrier in a different place.

63.2. When to build an abstract data type

The short answer: Whenever you can.

A better answer: The best heuristic I know for deciding what ADTs to include in a program is to write down a description of how your program is going to work. For each noun or noun phrase in the description, either identify a built-in data type to implement it or design an abstract data type.

For example: a grade database maintains a list of students, and for each student it keeps a list of grades. So here we might want data types to represent:

  • A list of students,
  • A student,
  • A list of grades,
  • A grade.

If grades are simple, we might be able to make them just be ints (or maybe doubles); to be on the safe side, we should probably create a Grade type with a typedef. The other types are likely to be more complicated. Each student might have in addition to his or her grades a long list of other attributes, such as a name, an email address, etc. By wrapping students up as abstract data types we can extend these attributes if we need to, or allow for very general implementations (say, by allowing a student to have an arbitrary list of keyword-attribute pairs). The two kinds of lists are likely to be examples of sequence types; we'll be seeing a lot of ways to implement these as the course progresses. If we want to perform the same kinds of operations on both lists, we might want to try to implement them as a single list data type, which then is specialized to hold either students or grades; this is not always easy to do in C, but we'll see examples of how to do this, too.

Whether or not this set of four types is the set we will finally use, writing it down gives us a place to start writing our program. We can start writing interface files for each of the data types, and then evolve their implementations and the main program in parallel, adjusting the interfaces as we find that we have provided too little (or too much) data for each component to do what it must.


CategoryProgrammingNotes

64. C/Definitions

One of the goals of programming is to make your code readable by other programmers (including your future self). An important tool for doing so is to give good names to everything. Not only can such a name document what it names, it can also be used to hide implementation details that are not interesting or that may change later.

65. Naming types

Suppose that you want to represent character strings as

   1 struct string {
   2     int length;
   3     char *data;         /* malloc'd block */
   4 };
   5 
   6 int string_length(const struct string *s);

If you later change the representation to, say, traditional null-terminated char * strings or some even more complicated type (union string **some_string[2];), you will need to go back and replace ever occurrence of struct string * in every program that uses it with the new type. Even if you don't expect to change the type, you may still get tired of typing struct string * all the time, especially if your fingers slip and give you struct string sometimes.

The solution is to use a typedef, which defines a new type name:

   1 typedef struct string *String;
   2 
   3 int string_length(String s);

The syntax for typedef looks like a variable declaration preceded by typedef, except that the variable is replaced by the new type name that acts like whatever type the defined variable would have had. You can use a name defined with typedef anywhere you could use a normal type name, as long as it is later in the source file than the typedef definition. Typically typedefs are placed in a header file (.h file) that is then included anywhere that needs them.

You are not limited to using typedefs only for complex types. For example, if you were writing numerical code and wanted to declare overtly that a certain quantity was not just any double but actually a length in meters, you could write

   1 typedef double LengthInMeters;
   2 typedef double AreaInSquareMeters;
   3 
   4 AreaInSquareMeters rectangleArea(LengthInMeters height, LengthInMeters width);

Unfortunately, C does not do type enforcement on typedef'd types: it is perfectly acceptable to the compiler if you pass a value of type AreaInSquareMeters as the first argument to rectangleArea, since by the time it checks it has replaced by AreaInSquareMeters and LengthInMeters by double. So this feature is not as useful as it might be, although it does mean that you can write rectangleArea(2.0, 3.0) without having to do anything to convert 2.0 and 3.0 to type LengthInMeters.

66. Naming constants

Suppose that you have a function (call it getchar) that needs to signal that sometimes it didn't work. The usual way is to return a value that the function won't normally return. Now, you could just tell the user what value that is:

   1 /* get a character (as an `int` ASCII code) from `stdin` */
   2 /* return -1 on end of file */
   3 int getchar(void);

and now the user can write

   1     while((c = getchar()) != -1) {
   2         ...
   3     }

But then somebody reading the code has to remember that -1 means "end of file" and not "signed version of \0xff" or "computer room on fire, evacuate immediately." It's much better to define a constant EOF that happens to equal -1, because among other things if you change the special return value from getchar later then this code will still work (assuming you fixed the definition of EOF):

   1     while((c = getchar()) != EOF) {
   2         ...
   3     }

So how do you declare a constant in C? The traditional approach is to use the C preprocessor, the same tool that gets run before the compiler to expand out #include directives. To define EOF, the file /usr/include/stdio.h includes the text

   1 #define EOF (-1)
   2 

What this means is that whenever the characters EOF appear in a C program as a separate word (e.g. in 1+EOF*3 but not in wOEFully_long_variable_name), then the preprocessor will replace them with the characters (-1). The parentheses around the -1 are customary to ensure that the -1 gets treated as a separate constant and not as part of some larger expression.

In general, any time you have a non-trivial constant in a program, it should be #defined. Examples are things like array dimensions, special tags or return values from functions, maximum or minimum values for some quantity, or standard mathematical constants (e.g., /usr/include/math.h defines M_PI as pi to umpteen digits). This allows you to write

   1     char buffer[MAX_FILENAME_LENGTH+1];
   2     
   3     area = M_PI*r*r;
   4 
   5     if(status == COMPUTER_ROOM_ON_FIRE) {
   6         evacuate();
   7     }

instead of

   1     char buffer[513];
   2     
   3     area = 3.141592319*r*r;
   4 
   5     if(status == 136) {
   6         evacuate();
   7     }

which is just an invitation to errors (including one the one on line 3).

Like typedefs, #defines that are intended to be globally visible are best done in header files; in large programs you will want to #include them in many source files. The usual convention is to write #defined names in all-caps to remind the user that they are macros and not real variables.

67. Naming values in sequences

C provides the enum construction for the special case where you want to have a sequence of named integer constants, but you don't care what their actual values are, as in

   1 enum color { RED, BLUE, GREEN, MAUVE, TURQUOISE };

This will assign the value 0 to RED, 1 to BLUE, and so on. These values are effectively of type int, although you can declare variables, arguments, and return values as type enum color to indicate their intended interpretation Despite declaring a variable enum color c (say), the compiler will still allow c to hold arbitrary values of type int; see enums_are_ints.c for some silly examples of this.

It is also possible to specify particular values for particular enumerated constants, as in

   1 enum color { RED = 37, BLUE = 12, GREEN = 66, MAUVE = 5, TURQUOISE };

Anything that doesn't get a value starts with one plus the previous value; so the above definition would set TURQUOISE to 6.

In practice, enums are seldom used, and you will more commonly see a stack of #defines:

   1 #define RED     (0)
   2 #define BLUE    (1)
   3 #define GREEN   (2)
   4 #define MAUVE   (3)
   5 #define TURQUOISE (4)
   6 

The reason for this is partly historical—enum arrived late in the evolution of C—but partly practical: a table of #defines makes it much easier to figure out which color is represented by 3, without having to count through a list. But if you never plan to use the numerical values, enum is a better choice.

68. Other uses of #define

It is also possible to use #define to define preprocessor macros that take parameters; this will be discussed in C/Macros.


CategoryProgrammingNotes

69. AsymptoticNotation

70. Definitions

O(f(n))

A function g(n) is in O(f(n)) ("big O of f(n)") if there exist constants c and N such that |g(n)| ≤ c |f(n)| for n > N.

Ω(f(n))

A function g(n) is in Ω(f(n)) ("big Omega of f(n)") if there exist constants c and N such that |g(n)| >= c |f(n)| for n > N.

Θ(f(n))

A function g(n) is in Θ(f(n)) ("big Theta of f(n)") if there exist constants c1, c2, and N such that c1|f(n)| ≤ |g(n)| ≤ c2|f(n)| for n > N.

o(f(n))

A function g(n) is in o(f(n)) ("little o of f(n)") if for every c > 0 there exists an N such that |g(n)| ≤ c |f(n)| for n > N. This is equivalent to saying that limn → ∞ g(n)/f(n) = 0.

ω(f(n))

A function g(n) is in ω(f(n) ("little omega of f(n)") if for every c > 0 there exists an N such that |g(n)| >= c |f(n)| for n > N. This is equivalent to saying that limn → ∞ |g(n)|/|f(n)| diverges to infinity.

71. Motivating the definitions

Why would we use this notation?

  • Constant factors vary from one machine to another. The c factor hides this. If we can show that an algorithm runs in O(n2) time, we can be confident that it will continue to run in O(n2) time no matter how fast (or how slow) our computers get in the future.

  • For the N threshold, there are several excuses:
    • Any problem can theoretically be made to run in O(1) time for any finite subset of the possible inputs (e.g. all inputs expressible in 50 MiB or less), by prefacing the main part of the algorithm with a very large TableLookup. So it's meaningless to talk about the relative performance of different algorithms for bounded inputs.

    • If f(n) > 0 for all n, then we can get rid of N (or set it to zero) by making c larger enough. But some f(n) take on zero—or undefined—values for interesting n (e.g., f(n) = n2 is zero when n is zero, and f(n) = log(n) is undefined for n = 0 and zero for n = 1). Allowing the minimum N lets us write O(n2) or O(log n) for classes of functions that we would otherwise have to write more awkwardly as something like O(n2+1) or O(log (n+2)).

    • Putting the n > N rule in has a natural connection with the definition of a limit, where the limit as n goes to infinity of g(n) is defined to be x if for each epsilon > 0 there is an N such that |g(n)-x| < epsilon for n > N. Among other things, this permits the limit test that says g(n) = O(f(n)) if the limit as n goes to infinity of g(n)/f(n) exists and is finite.

72. Proving asymptotic bounds

Most of the time when we use asymptotic notation, we compute bounds using stock theorems like O(f(n)) + O(g(n)) = O(max(f(n), g(n)) or O(c f(n)) = O(f(n)). But sometimes we need to unravel the definitions to see whether a given function fits in a give class, or to prove these utility theorems to begin with. So let's do some examples of how this works.

  • The function n is in O(n3).

    Proof

    We must find c, N such that for all n > N, |n| < c|n3|. Since n3 is much bigger than n for most values of n, we'll pick c to be something convenient to work with, like 1. So now we need to choose N so that for all n > N, |n| < |n3|. It is not the case that |n| < |n3| for all n (try plotting n vs n3 for n < 1) but if we let N = 1, then we have n > 1, and we just need to massage this into n3 > n. There are a couple of ways to do this, but the quickest is probably to observe that squaring and multiplying by n (a positive quantity) are both increasing functions, which means that from n > 1 we can derive n2 > 12 = 1 and then n2 * n = n3 > 1 * n = n.

  • The function n3 is not in O(n).

    Proof

    Here we need to negate the definition of O(n), a process that turns all existential quantifiers into universal quantifiers and vice versa. So what we need to show is that for all c, N, there exists some n > N for which |n3| is not less than c |n|. We can ignore any c ≤ 0 since |n| is always positive. So fix some c > 0 and N. We must find an n > N for which n3 > c n. Solving for n in this inequality gives n > c1/2; so setting n > max(N, c1/2) finishes the proof.

  • If f1(n) is in O(g(n)) and f2(n) is in O(g(n)), then f1(n)+f2(n) is in O(g(n)).

    Proof

    Since f1(n) is in O(g(n)), there exist constants c1, N1 such that for all n > N1, |f1(n)| < c |g(n)|. Similarly there exist c2, N2 such that for all n > N2, |f2(n)| < c |g(n)|. To show f1(n)+f2(n) in O(g(n)), we must find constants c and N such that for all n > N, |f1(n)+f2(n)| < c |g(n)|. Let's let c = c1+c2. Then if n is greater than max(N1, N2), it is greater than both N1 and N2, so we can add together |f1| < c1|g| and |f2| < c2|g| to get |f1+f2| ≤ |f1| + |f2| < (c1+c2) |g| = c |g|.

73. Asymptotic notation hints

73.1. Remember the difference between big-O, big-Ω, and big-Θ

  • Use big-O when you have an upper bound on a function, e.g. the zoo never got more than O(1) new gorillas per year, so there were at most O(t) gorillas at the zoo in year t.
  • Use big-Ω when you have a lower bound on a function, e.g. every year the zoo got at least one new gorilla, so there were at least Ω(t) gorillas at the zoo in year t.
  • Use big-Θ when you know the function exactly to within a constant-factor error, e.g. every year the zoo got exactly five new gorillas, so there were Θ(t) gorillas at the zoo in year t.

For the others, use little-o and ω when one function becomes vanishingly small relative to the other, e.g. new gorillas arrived rarely and with declining frequency, so there were o(t) gorillas at the zoo in year t. These are not used as much as big-O, big-Ω, and big-Θ in the algorithms literature.

73.2. Simplify your asymptotic terms as much as possible

  • O(f(n)) + O(g(n)) = O(f(n)) when g(n) = O(f(n)). If you have an expression of the form O(f(n) + g(n)), you can almost always rewrite it as O(f(n)) or O(g(n)) depending on which is bigger. The same goes for Ω or Θ.
  • O(c f(n)) = O(f(n)) if c is a constant. You should never have a constant inside a big O. This includes bases for logarithms: since loga x = logb x / logb a, you can always rewrite O(lg n), O(ln n), or O(log1.4467712 n) as just O(log n).

  • But watch out for exponents and products: O(3n n3.1178 log1/3 n) is already as simple as it can be.

73.3. Remember the limit trick

If you are confused whether e.g. log n is O(f(n)), try computing the limit as n goes to infinity of (log n)/n, and see if it's a constant (zero is ok). You may need to use L'Hôpital's Rule to evaluate such limits if they aren't obvious.

74. Variations in notation

As with many tools in mathematics, you may see some differences in how asymptotic notation is defined and used.

74.1. Absolute values

Some authors leave out the absolute values. For example, BiggsBook defines f(n) as being in O(g(n)) if f(n) ≤ c g(n) for sufficiently large n. If f(n) and g(n) are non-negative, this is not an unreasonable definition. But it produces odd results if either can be negative: for example, by this definition, -n1000 is in O(n2). (Some authors define O(), Ω(), Θ() only for non-negative functions, avoiding this problem.)

The most common definition (which we will use) says that f(n) is in O(g(n)) if |f(n)| ≤ c |g(n)| for sufficiently large n; by this definition -n1000 is not in O(n2), though it is in O(n1000). This definition was designed for error terms in asymptotic expansions of functions, where the error term might represent a positive or negative error.

You can usually assume that algorithm running times are non-negative, so dropping the absolute value signs is generally harmless in algorithm analysis, but you should remember the absolute value definition in case you run into O() in other contexts.

74.2. Abusing the equals sign

Formally, we can think of O(g(n)) as a predicate on functions, which is true of all functions f(n) that satisfy f(n) ≤ c g(n) for some c and sufficiently large n. This requires writing that n2 is O(n2) where most computer scientists or mathematicians would just write n2 = O(n2). Making sense of the latter statement involves a standard convention that is mildly painful to define formally but that greatly simplifies asymptotic analyses.

Let's take a statement like the following:

  • O(n2) + O(n3) + 1 = O(n3).

What we want this to mean is that the left-hand side can be replaced by the right-hand side without causing trouble. To make this work formally, we define the statement as meaning:

  • For any f in O(n2) and any g in O(n3), there exists an h in O(n3) such that f(n) + g(n) + 1 = h(n).

In general, any appearance of O(), Ω(), or Θ() on the left-hand side gets a universal quantifier (for all) and any appearance of O(), Ω(), or Θ() on the right-hand side gets an existential quantifier (there exists). So

  • f(n) + o(f(n)) = Θ(f(n))

becomes

  • For any g in o(f(n)), there exists an h in Θ(f(n)) such that f(n)+g(n)=h(n).

and

  • O(f(n))+O(g(n))+1 = O(max(f(n),g(n)))+1

becomes

  • For any r in O(f(n)) and s in O(g(n)), there exists t in O(max(f(n),g(n)) such that r(n)+s(n)+1=t(n)+1.

The nice thing about this definition is that as long as you are careful about the direction the equals sign goes in, you can treat these complicated pseudo-equations like ordinary equations. For example, since O(n2) + O(n3) = O(n3), we can write

  • n2/2 + n(n+1)(n+2)/6 = n2/2 + O(n3) = O(n2) + O(n3) = O(n3),

which is much simpler than what it would look like if we had to talk about particular functions being elements of particular sets of functions.

This is an example of abuse of notation, the practice of redefining some standard bit of notation (in this case, equations) to make calculation easier. It's generally a safe practice as long as everybody understands what is happening. But beware of applying facts about unabused equations to the abused ones. Just because O(n2) = O(n3) doesn't mean O(n3) = O(n2)—the big-O equations are not reversible the way ordinary equations are.

More discussion of this can be found in CormenEtAl.

75. More information


CategoryAlgorithmNotes CategoryMathNotes CategoryProgrammingNotes

76. LinkedLists

Linked lists are about the simplest data structure beyond arrays. They aren't very efficient for many purposes, but have very good performance for certain specialized applications.

77. Stacks and linked lists

On my desk is a pile of books that I am supposed to put away on my bookshelf. I don't care much about how they are organized, I just want to be able to dump a book quickly so that I can later go through and put all of them back at once. Because it's easiest just to dump each book on the top of the pile, I have effectively built a data structure called a stack, which supports a push operation (add a book to the top of the pile) and a pop operation (take the top book off and return it).

To build a stack in a program, I can adopt a similar strategy. When a new item appears, I will box it up inside a struct that will also include a pointer to the item below it in the stack. Since that item will point to the item below it, and so on, I end up with an arbitrarily long chain of pointers (usually ending in a null pointer).

The cost of push and pop operations are both O(1): they don't depend on the number of elements in the stack, since they are only working on the top element or two. Doing almost anything else (e.g., finding the k-th element of the stack or searching for a particular value) is likely to be much more expensive, O(n) in the worst case. The reason is that unlike an array, a linked list scatters its contents throughout memory, and the only way to get at a particular element is to crawl through all the ones that precede it.

77.1. Implementation

A very concrete implementation of a stack using a linked list might look like this:

   1 #include <stdlib.h>
   2 #include <string.h>
   3 
   4 /* Functional-style stack */
   5 /* All operations consume the old value and return the new value of the stack */
   6 
   7 typedef struct stack {
   8     char *book;         /* malloc'd name of book */
   9     struct stack *next; /* next item in stack, or 0 if there aren't any more */
  10 } *Stack;
  11 
  12 #define EMPTY_STACK (0)
  13 
  14 /* create a new empty stack */
  15 Stack
  16 stackCreate(void)
  17 {
  18     return EMPTY_STACK;
  19 }
  20 
  21 /* push a new book on the stack */
  22 /* copies second argument book (so you don't need to keep it around) */
  23 /* returns 0 on allocation error */
  24 Stack
  25 stackPush(Stack old, const char *book)
  26 {
  27     Stack new;
  28 
  29     new = malloc(sizeof(*new));
  30     if(new == 0) return 0;
  31 
  32     new->next = old;
  33     new->book = strdup(book);
  34 
  35     if(new->book == 0) {
  36         free(new);
  37         return 0;
  38     }
  39 
  40     return new;
  41 }
  42 
  43 /* pops a book off the stack */
  44 /* returns new tail of stack, stores book in *book */
  45 /* *book is malloc'd data that the caller should free */
  46 /* Stores 0 in *book if stack is empty */
  47 Stack
  48 stackPop(Stack old, char **book /*RETVAL*/)
  49 {
  50     Stack new;
  51 
  52     if(old == 0) {
  53         *book = 0;
  54         return 0;
  55     } else {
  56         new = old->next;
  57         *book = old->book;
  58         free(old);
  59         return new;
  60     }
  61 }
  62 
  63 /* frees a stack */
  64 void
  65 stackDestroy(Stack s)
  66 {
  67     char *book;
  68 
  69     while(s) {
  70         s = stackPop(s, &book);
  71         free(book);
  72     }
  73 }
functional_stack.c

The meat of this implementation is in stackPush and stackPop. These act like a constructor/destructor pair for individual stack elements. The stackPush funciton also does the work of linking the new element into the stack and copying the book argument. The choice to copy book is a design decision: it insulates the contents of the stack from any changes that might happen to book after stackPush exits (perhaps book is a buffer that is reused to read a new line from the input after each call to stackPush), but it may be more than what the user needs, and it forces the user to free the returned value in stackPop. An alternative would be to let the user call strdup for themselves.

77.2. A more complicated implementation

The simple implementation of a stack above doesn't really act like a single object; instead, it acts more like a family of constant values you can do arithmetic on, where applying stackPush or stackPop consumes some old value and generates a new one. For many applications, it is more natural to just think of having a single stack that changes over time.

We can do this with another layer of indirection. Instead of having a Stack be a pointer to the first element, we'll have it be a pointer to a pointer to the first element. This requires a little bit more work inside the implementation but looks nicer from the outside.

   1 #include <stdlib.h>
   2 #include <string.h>
   3 
   4 /* Imperative-style stack */
   5 /* All operations modify the stack in place */
   6 
   7 /* a Stack is a pointer to a pointer to the first element of a linked list */
   8 typedef struct stack {
   9     char *book;         /* malloc'd name of book */
  10     struct stack *next; /* next item in stack, or 0 if there aren't any more */
  11 } **Stack;
  12 
  13 /* create a new empty stack */
  14 Stack
  15 stackCreate(void)
  16 {
  17     Stack s;
  18     
  19     s = malloc(sizeof(struct stack *));
  20     *s = 0;
  21 
  22     return s;
  23 }
  24 
  25 /* push a new book on the stack */
  26 /* copies second argument book (so you don't need to keep it around) */
  27 /* returns 0 on allocation error or 1 on success */
  28 int
  29 stackPush(Stack s, const char *book)
  30 {
  31     struct stack *new;
  32 
  33     new = malloc(sizeof(*new));
  34 
  35     new->next = *s;
  36     new->book = strdup(book);
  37 
  38     if(new->book == 0) {
  39         free(new);
  40         return 0;
  41     }
  42 
  43     *s = new;
  44     return 1;
  45 }
  46 
  47 /* pops a book off the stack */
  48 /* returns 0 if stack is empty */
  49 char *
  50 stackPop(Stack s)
  51 {
  52     struct stack *new;
  53     char *book;
  54 
  55     if(*s == 0) {
  56         return 0;
  57     } else {
  58         book = (*s)->book;
  59 
  60         /* we have to save (*s)->next before we free it */
  61         new = (*s)->next;
  62         free(*s);
  63         *s = new;
  64         return book;
  65     }
  66 }
  67 
  68 /* returns true if s is empty */
  69 int
  70 stackEmpty(Stack s)
  71 {
  72     return (*s) == 0;
  73 }
  74 
  75 /* frees a stack */
  76 void
  77 stackDestroy(Stack s)
  78 {
  79     while(!stackEmpty(s)) {
  80         free(stackPop(s));
  81     }
  82 
  83     free(s);
  84 }
object_stack.c

Here we have added a new stackEmpty routine because it's no longer obvious how to check (and we might someday want to change Stack to some other type where testing (*s) == 0 is no longer the right way to do it).

Note that the structure of the linked list is exactly the same in both implementations: the only difference is that the "imperative" version adds an extra level of indirection at the beginning. Yet another alternative is to have a dummy stack element at the beginning and do pushes and pops starting with the second place in the stack (i.e., replace (*s) with s->next in the second implementation). This consumes more space (we end up allocating space to hold s->book even though we don't put anything there), but can make things simpler for some more complicated linked-list structures.

77.3. Building a stack out of an array

When the elements of a stack are small, or when a maximum number of elements is known in advance, it often makes sense to build a stack from an array (with a variable storing the index of the top element) instead of a linked list. The reason is that pushes and pops only require updating the stack pointer instead of calling malloc or free to allocate space, and pre-allocating is almost always faster than allocating as needed. This is the strategy used to store the function call stack in almost all programs (the exception is in languages like Scheme, where the call stack is allocated on the heap because stack frames may outlive the function call that creates them).

78. Looping over a linked list

Looping over a linked list is not hard if you have access to the next pointers. (For a more abstract way to do this see C/Iterators.)

Let's imagine somebody gave us a pointer to the first struct stack in a list; call this pointer first. Then we can write a loop like this that prints the contents of the stack:

   1 void
   2 stackPrint(struct stack *first)
   3 {
   4     struct stack *elt;
   5 
   6     for(elt = first; elt != 0; elt = elt->next) {
   7         puts(elt->book);
   8     }
   9 }

There's not a whole lot to notice here except that for is perfectly happy to iterate over something that isn't a range of integers. The running time is linear in the length of the list (O(n)).

79. Looping over a linked list backwards

What if we want to loop over a linked list backwards? The next pointers all go the wrong way, so we have to save a trail of breadcrumbs to get back. The safest way to do this is to reverse the original list into an auxiliary list:

   1 void
   2 stackPrintReversed(struct stack *first)
   3 {
   4     struct stack *elt;
   5     Stack s2;                   /* uses imperative implementation */
   6 
   7     s2 = stackCreate();
   8 
   9     for(elt = first; elt != 0; elt = elt->next) {
  10         s2 = stackPush(s2, elt->book);
  11     }
  12 
  13     stackPrint(s2);
  14     stackDestroy(s2);
  15 }

Pushing all the elements from the first list onto s2 puts the first element on the bottom, so when we print s2 out, it's in the reverse order of the original stack.

We can also write a recursive function that prints the elements backwards. This function effectively uses the function call stack in place of the extra stack s2 above.

   1 void
   2 stackPrintReversedRecursive(struct stack *first)
   3 {
   4     if(first != 0) {
   5         /* print the rest of the stack */
   6         stackPrintReversedRecursive(first->next);
   7 
   8         /* then print the first element */
   9         puts(first->book);
  10     }
  11 }

The code in stackPrintReversedRecursive is shorter than the code in stackPrintReversed, and it is likely to be faster since it doesn't require allocating a second stack and copying all the elements. But it will only work for small stacks: because the function call stack is really a fixed-size array, if the input to stackPrintReversedRecursive is too big the recursion will go too deep cause a stack overflow.

If we want to do this sort of thing a lot, we should build a doubly-linked list, with a pointer in each element both to the next element and the previous element instead of a singly-linked list (see below for more).

80. Queues

Stacks are last-in-first-out (LIFO) data structures: when we pop, we get the last item we pushed. What if we want a first-in-first-out (FIFO) data structure? Such a data structure is called a queue and can also be implemented by a linked list. The difference is that if we want O(1) time for both the enqueue (push) and dequeue (pop) operations, we must keep around pointers to both ends of the linked list.

Here is a simple queue holding ints:

   1 #include <stdlib.h>
   2 
   3 struct elt {
   4     int item;
   5     struct elt *next;
   6 };
   7 
   8 struct queue {
   9     struct elt *head;   /* first element in queue, or 0 if empty */
  10     struct elt *tail;   /* last element in queue */
  11 };
  12 
  13 typedef struct queue *Queue;
  14 
  15 Queue
  16 queueCreate(void)
  17 {
  18     Queue q;
  19 
  20     q = malloc(sizeof(*q));
  21 
  22     if(q) {
  23         q->head = 0;
  24     }
  25 
  26     return q;
  27 }
  28 
  29 void
  30 enqueue(Queue q, int item)
  31 {
  32     struct elt *e;
  33 
  34     e = malloc(sizeof(*e));
  35     if(e == 0) abort();
  36 
  37     e->item = item;
  38     e->next = 0;
  39     
  40     if(q->head == 0) {
  41         /* special case for empty queue */
  42         q->head = q->tail = e;
  43     } else {
  44         /* patch e in after tail */
  45         q->tail->next = e;
  46         q->tail = e;
  47     }
  48 }
  49 
  50 int
  51 queueEmpty(Queue q)
  52 {
  53     return q->head == 0;
  54 }
  55 
  56 #define EMPTY_QUEUE (-1)
  57 
  58 /* returns EMPTY_QUEUE if queue is empty */
  59 int
  60 dequeue(Queue q)
  61 {
  62     struct elt *e;
  63     int retval;
  64 
  65     if(queueEmpty(q)) {
  66         return EMPTY_QUEUE;
  67     } else {
  68         /* pop first element off */
  69         e = q->head;
  70         q->head = e->next;
  71 
  72         /* save its contents and free it */
  73         retval = e->item;
  74         free(e);
  75 
  76         return retval;
  77     }
  78 }
queue.c

It is a bit trickier to build a queue out of an array than to build a stack. The difference is that while a stack pointer can move up and down, leaving the base of the stack in the same place, a naive implementation of a queue would have head and tail pointers both marching ever onward across the array leaving nothing but empty cells in their wake. While it is possible to have the pointers wrap around to the beginning of the array when they hit the end, if the queue size is unbounded the tail pointer will eventually catch up to the head pointer. At this point (as in a stack that overflows), it is necessary to allocate more space and copy the old elements over.

81. Deques and doubly-linked lists

Suppose we want a data structure that represents a line of elements where we can push or pop elements at either end. Such a data structure is known as a deque (pronounced like "deck"), and can be implemented with all operations taking O(1) time by a doubly-linked list, where each element has a pointer to both its successor and its predecessor.

An ordinary singly-linked list is not good enough. The reason is that even if we keep a pointer to both ends as in a queue, when it comes time to pop an element off the tail, we have no pointer to its predecessor ready to hand; the best we can do is scan from the head until we get to an element whose successor is the tail, which takes O(n) time.

The solution is to build something like the code below. To avoid special cases, we use a lot of 2-element arrays of pointers, so that pushing or popping from either end of the queue is completely symmetric.

   1 typedef struct deque *Deque;
   2 
   3 #define FRONT (0)
   4 #define BACK (1)
   5 
   6 #define NUM_DIRECTIONS (2)
   7 
   8 #define EMPTY_QUEUE (-1)
   9 
  10 Deque dequeCreate(void);
  11 
  12 void dequePush(Deque d, int direction, int value);
  13 
  14 int dequePop(Deque d, int direction);
  15 
  16 int dequeEmpty(Deque d);
  17 
  18 void dequeDestroy(Deque d);
deque.h

   1 #include <stdlib.h>
   2 #include <assert.h>
   3 
   4 #include "deque.h"
   5 
   6 struct elt {
   7     int value;
   8     struct elt *next[NUM_DIRECTIONS];
   9 };
  10 
  11 struct deque {
  12     struct elt *head[NUM_DIRECTIONS];
  13 };
  14 
  15 Deque
  16 dequeCreate(void)
  17 {
  18     Deque d;
  19 
  20     d = malloc(sizeof(*d));
  21     if(d) {
  22         d->head[FRONT] = d->head[BACK] = 0;
  23     } 
  24 
  25     return d;
  26 }
  27 
  28 void
  29 dequePush(Deque d, int direction, int value)
  30 {
  31     struct elt *e;
  32 
  33     assert(direction == FRONT || direction == BACK);
  34 
  35     e = malloc(sizeof(*e));
  36     if(e == 0) abort();  /* kill process on malloc failure */
  37     
  38     e->next[direction] = d->head[direction];
  39     e->next[!direction] = 0;
  40     e->value = value;
  41 
  42     if(d->head[direction]) {
  43         d->head[direction]->next[!direction] = e;
  44     }
  45 
  46     d->head[direction] = e;
  47 
  48     /* special case for empty deque */
  49     /* e becomes the head on both sides */
  50     if(d->head[!direction] == 0) {
  51         d->head[!direction] = e;
  52     }
  53 }
  54 
  55 int
  56 dequePop(Deque d, int direction)
  57 {
  58     struct elt *e;
  59     int retval;
  60 
  61     assert(direction == FRONT || direction == BACK);
  62 
  63     e = d->head[direction];
  64 
  65     if(e == 0) return EMPTY_QUEUE;
  66 
  67     /* else remove it */
  68     d->head[direction] = e->next[direction];
  69 
  70     if(d->head[direction] != 0) {
  71         d->head[direction]->next[!direction] = 0;
  72     } else { 
  73         /* special case when e is the last element */
  74         /* clear other head field */
  75         d->head[!direction] = 0;
  76     }
  77 
  78     retval = e->value;
  79 
  80     free(e);
  81 
  82     return retval;
  83 }
  84 
  85 int
  86 dequeEmpty(Deque d)
  87 {
  88     return d->head[FRONT] == 0;
  89 }
  90 
  91 void
  92 dequeDestroy(Deque d)
  93 {
  94     while(!dequeEmpty(d)) {
  95         dequePop(d, FRONT);
  96     }
  97 
  98     free(d);
  99 }
deque.c

And here is some test code: test_deque.c, Makefile.

82. Circular linked lists

For some applications, there is no obvious starting or ending point to a list, and a circular list (where the last element points back to the first) may be appropriate. Circular doubly-linked lists can also be used to build deques; a single pointer into the list tracks the head of the deque, with some convention adopted for whether the head is an actual element of the list (at the front, say, with its left neighbor at the back) or a dummy element that is not considered to be part of the list.

The selling point of circular doubly-linked lists as a concrete data structure is that insertions and deletions can be done anywhere in the list with only local information. For example, here are some routines for manipulating a doubly-linked list directly. We'll make our lives easy and assume (for the moment) that the list has no actual contents to keep track of.

   1 #include <stdlib.h>
   2 
   3 /* directions for doubly-linked list next pointers */
   4 #define RIGHT (0)
   5 #define LEFT (1)
   6 
   7 struct elt {
   8     struct elt *next[2];
   9 };
  10 
  11 typedef struct elt *Elt;
  12 
  13 /* create a new circular doubly-linked list with 1 element */
  14 /* returns 0 on allocation error */
  15 Elt
  16 listCreate(void)
  17 {
  18     Elt e;
  19 
  20     e = malloc(sizeof(*e));
  21     if(e) {
  22         e->next[LEFT] = e->next[RIGHT] = e;
  23     }
  24 
  25     return e;
  26 }
  27 
  28 /* remove an element from a list */
  29 /* Make sure you keep a pointer to some other element! */
  30 /* does not free the removed element */
  31 void
  32 listRemove(Elt e)
  33 {
  34     /* splice e out */
  35     e->next[RIGHT]->next[LEFT] = e->next[LEFT];
  36     e->next[LEFT]->next[RIGHT] = e->next[RIGHT];
  37 }
  38     
  39 /* insert an element e into list in direction dir from head */
  40 void
  41 listInsert(Elt head, int dir, Elt e)
  42 {
  43     /* fill in e's new neighbors */
  44     e->next[dir] = head->next[dir];
  45     e->next[!dir] = head;
  46 
  47     /* make neigbhors point back at e */
  48     e->next[dir]->next[!dir] = e;
  49     e->next[!dir]->next[dir] = e;
  50 }
  51 
  52 /* split a list, removing all elements between e1 and e2 */
  53 /* e1 is the leftmost node of the removed subsequence, e2 rightmost */
  54 /* the removed elements are formed into their own linked list */
  55 /* comment: listRemove could be implemented as listSplit(e,e) */
  56 void
  57 listSplit(Elt e1, Elt e2)
  58 {
  59     /* splice out the new list */
  60     e2->next[RIGHT]->next[LEFT] = e1->next[LEFT];
  61     e1->next[LEFT]->next[RIGHT] = e2->next[RIGHT];
  62 
  63     /* fix up the ends */
  64     e2->next[RIGHT] = e1;
  65     e1->next[LEFT] = e2;
  66 }
  67 
  68 /* splice a list starting at e2 after e1 */
  69 /* e2 becomes e1's right neighbor */
  70 /* e2's left neighbor becomes left neighbor of e1's old right neighbor */
  71 void
  72 listSplice(Elt e1, Elt e2)
  73 {
  74     /* fix up tail end */
  75     e2->next[LEFT]->next[RIGHT] = e1->next[RIGHT];
  76     e1->next[RIGHT]->next[LEFT] = e2->next[LEFT];
  77 
  78     /* fix up e1 and e2 */
  79     e1->next[RIGHT] = e2;
  80     e2->next[LEFT] = e1;
  81 }
  82 
  83 /* free all elements of the list containing e */
  84 void
  85 listDestroy(Elt e)
  86 {
  87     Elt target;
  88     Elt next;
  89 
  90     /* we'll free elements until we get back to e, then free e */
  91     /* note use of pointer address comparison to detect end of loop */
  92     for(target = e->next[RIGHT]; target != e; target = next) {
  93         next = target->next[RIGHT];
  94         free(target);
  95     }
  96 
  97     free(e);
  98 }
circular.c

The above code might or might not actually work. What if it doesn't? It may make sense to include some sanity-checking code that we can run to see if our pointers are all going to the right place:

   1 /* assert many things about correctness of the list */
   2 /* Amazingly, this is guaranteed to abort or return no matter
   3    how badly screwed up the list is. */
   4 void
   5 listSanityCheck(Elt e)
   6 {
   7     Elt check;
   8 
   9     assert(e != 0);
  10 
  11     check = e;
  12 
  13     do {
  14 
  15         /* are our pointers consistent with our neighbors? */
  16         assert(check->next[RIGHT]->next[LEFT] == check);
  17         assert(check->next[LEFT]->next[RIGHT] == check);
  18 
  19         /* on to the next */
  20         check = check->next[RIGHT];
  21 
  22     } while(check != e);
  23 }

What if we want to store something in this list? The simplest approach is to extend the definition of struct elt:

   1 struct elt {
   2     struct elt *next[2];
   3     char *name;
   4     int socialSecurityNumber;
   5     int gullibility;
   6 };

But then we can only use the code for one particular type of data. An alternative approach is to define a new Elt-plus struct:

   1 struct fancyElt {
   2     struct elt *next[2];
   3     char *name;
   4     int socialSecurityNumber;
   5     int gullibility;
   6 };

and then use pointer casts to pretend the fancy structs into Elts:

   1     struct fancyElt *e;
   2 
   3     e = malloc(sizeof(*e));
   4 
   5     /* fill in fields on e */
   6 
   7     listInsert(someList, (Elt) e);

The trick here is that as long as the initial part of the struct fancyElt looks like a struct elt, any code that expects a struct elt will happily work with it and ignore the fields that happen to be sitting later in memory. (This trick is how C++ inheritance works.)

The downside is that if something needs to be done with the other fields (e.g freeing e->name if e is freed), then the Elt functions won't know to do this. So if you use this trick you should be careful.

A similar technique using void * pointers is described in C/GenericContainers.

83. What linked lists are and are not good for

Linked lists are good for any task that involves inserting or deleting elements next to an element you already have a pointer to; such operations can usually be done in O(1) time. They generally beat arrays (even resizeable arrays) if you need to insert or delete in the middle of a list, since an array has to copy any elements above the insertion point to make room; if inserts or deletes always happen at the end, an array may be better.

Linked lists are not good for any operation that requires random access, since reaching an arbitrary element of a linked list takes as much as O(n) time. For such applications, arrays are better if you don't need to insert in the middle; if you do, you should use some sort of tree (see BinaryTrees).

84. Further reading

KernighanPike gives an example of a linked list in Section 2.7. A description of many different kinds of linked lists with pictures can be found at Linked_list.


CategoryProgrammingNotes

85. C/Debugging

86. Debugging in general

Basic method of all debugging:

  1. Know what your program is supposed to do.
  2. Detect when it doesn't.
  3. Fix it.

A tempting mistake is to skip step 1, and just try randomly tweaking things until the program works. Better is to see what the program is doing internally, so you can see exactly where and when it is going wrong. A second temptation is to attempt to intuit where things are going wrong by staring at the code or the program's output. Avoid this temptation as well: let the computer tell you what it is really doing inside your program instead of guessing.

87. Assertions

Every non-trivial C program should include <assert.h>, which gives you the assert macro (see KernighanRitchie Appendix B6). The assert macro tests if a condition is true and halts your program with an error message if it isn't:

   1 #include <assert.h>
   2 
   3 int
   4 main(int argc, char **argv)
   5 {
   6     assert(2+2 == 5);
   7     return 0;
   8 }
no.c

$ gcc -o no no.c
$ ./no
no: no.c:6: main: Assertion `2+2 == 5' failed.

Line numbers and everything, even if you compile with the optimizer turned on. Much nicer than a mere segmentation fault, and if you run it under the debugger, the debugger will stop exactly on the line where the assert failed so you can poke around and see why.

88. gdb

The standard debugger on Linux is called gdb. This lets you run your program under remote control, so that you can stop it and see what is going on inside.

Let's look at a contrived example. Suppose you have the following program bogus.c:

   1 #include <stdio.h>
   2 
   3 /* Print the sum of the integers from 1 to 1000 */
   4 int
   5 main(int argc, char **argv)
   6 {
   7     int i;
   8     int sum;
   9 
  10     sum = 0;
  11     for(i = 0; i -= 1000; i++) {
  12         sum += i;
  13     }
  14     printf("%d\n", sum);
  15     return 0;
  16 }
bogus.c

Let's compile and run it and see what happens:

$ gcc -g3 -o bogus bogus.c
$ ./bogus
-34394132
$

That doesn't look like the sum of 1 to 1000. So what went wrong? If we were clever, we might notice that the test in the for loop is using the mysterious -= operator instead of the <= operator that we probably want. But let's suppose we're not so clever right now—it's four in the morning, we've been working on bogus.c for twenty-nine straight hours, and there's a -= up there because in our befuddled condition we know in our bones that it's the right operator to use. We need somebody else to tell us that we are deluding ourselves, but nobody is around this time of night. So we'll have to see what we can get the computer to tell us.

The first thing to do is fire up gdb, the debugger. This runs our program in stop-motion, letting us step through it a piece at a time and watch what it is actually doing. In the example below gdb is run from the command line. You can also run it directly from Emacs with M-x gdb, which lets Emacs track and show you where your program is in the source file with a little arrow, or (if you are logged in directly on a Zoo machine) by running ddd, which wraps gdb in a graphical user interface.

$ gdb bogus
GNU gdb 4.17.0.4 with Linux/x86 hardware watchpoint and FPU support
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
(gdb) run
Starting program: /home/accts/aspnes/tmp/bogus 
-34394132

Program exited normally.

So far we haven't learned anything. To see our program in action, we need to slow it down a bit. We'll stop it as soon as it enters main, and step through it one line at a time while having it print out the values of the variables.

(gdb) break main
Breakpoint 1 at 0x8048476: file bogus.c, line 9.
(gdb) run
Starting program: /home/accts/aspnes/tmp/bogus 

Breakpoint 1, main (argc=1, argv=0xbffff9ac) at bogus.c:9
9           sum = 0;
(gdb) display sum
1: sum = 1
(gdb) n
10          for(i = 0; i -= 1000; i++)
1: sum = 0
(gdb) display i
2: i = 0
(gdb) n
11              sum += i;
2: i = -1000
1: sum = 0
(gdb) n
10          for(i = 0; i -= 1000; i++)
2: i = -1000
1: sum = -1000
(gdb) n
11              sum += i;
2: i = -1999
1: sum = -1000
(gdb) n
10          for(i = 0; i -= 1000; i++)
2: i = -1999
1: sum = -2999
(gdb) quit
The program is running.  Exit anyway? (y or n) y
$

Here we are using break main to tell the program to stop as soon as it enters main, display to tell it to show us the value of the variables i and sum whenever it stops, and n (short for next) to execute the program one line at a time.

When stepping through a program, gdb displays the line it will execute next as well as any variables you've told it to display. This means that any changes you see in the variables are the result of the previous displayed line. Bearing this in mind, we see that i drops from 0 to -1000 the very first time we hit the top of the for loop and drops to -1999 the next time. So something bad is happening in the top of that for loop, and if we squint at it a while we might begin to suspect that i -= 1000 is not the nice simple test we might have hoped it was.

88.1. My favorite gdb commands

help
Get a description of gdb's commands.
run
Runs your program. You can give it arguments that get passed in to your program just as if you had typed them to the shell. Also used to restart your program from the beginning if it is already running.
quit
Leave gdb, killing your program if necessary.
break
Set a breakpoint, which is a place where gdb will automatically stop your program. Some examples:
  • break somefunction stops before executing the first line somefunction.

  • break 117 stops before executing line number 117.

list
Show part of your source file with line numbers (handy for figuring out where to put breakpoints). Examples:
  • list somefunc lists all lines of somefunc.

  • list 117-123 lists lines 117 through 123.

next
Execute the next line of the program, including completing any procedure calls in that line.
step
Execute the next step of the program, which is either the next line if it contains no procedure calls, or the entry into the called procedure.
finish
Continue until you get out of the current procedure (or hit a breakpoint). Useful for getting out of something you stepped into that you didn't want to step into.
cont

(Or continue). Continue until (a) the end of the program, (b) a fatal error like a Segmentation Fault or Bus Error, or (c) a breakpoint. If you give it a numeric argument (e.g., cont 1000) it will skip over that many breakpoints before stopping.

print

Print the value of some expression, e.g. print i.

display

Like print, but runs automatically every time the program stops. Useful for watching values that change often.

88.2. Debugging strategies

In general, the idea behind debugging is that a bad program starts out sane, but after executing for a while it goes bananas. If you can find the exact moment in its execution where it first starts acting up, you can see exactly what piece of code is causing the problem and have a reasonably good chance of being able to fix it. So a typical debugging strategy is to put in a breakpoint (using break) somewhere before the insanity hits, "instrument" the program (using display) so that you can watch it going insane, and step through it (using next, step, or breakpoints and cont) until you find the point of failure. Sometimes this process requires restarting the program (using run) if you skip over this point without noticing it immediately.

For large or long-running programs, it often makes sense to do binary search to find the point of failure. Put in a breakpoint somewhere (say, on a function that is called many times or at the top of a major loop) and see what the state of the program is after going through the breakpoint 1000 times (using something like cont 1000). If it hasn't gone bonkers yet, try restarting and going through 2000 times. Eventually you bracket the error as occurring (for example) somewhere between the 4000th and 8000th occurrence of the breakpoint. Now try stepping through 6000 times; if the program is looking good, you know the error occurs somewhere between the 6000th and 8000th breakpoint. A dozen or so more experiments should be enough isolate the bug to a specific line of code.

The key to all debugging is knowing what your code is supposed to do. If you don't know this, you can't tell the lunatic who thinks he's Napoleon from lunatic who really is Napoleon. If you're confused about what your code is supposed to be doing, you need to figure out what exactly you want it to do. If you can figure that out, often it will be obvious what is going wrong. If it isn't obvious, you can always go back to gdb.

89. Valgrind

The valgrind program can be used to detect some (but not all) common errors in C programs that use pointers and dynamic storage allocation. On the Zoo, you can run valgrind on your program by putting valgrind at the start of the command line:

valgrind ./my-program arg1 arg2 < test-input

This will run your program and produce a report of any allocations and de-allocations it did. It will also warn you about common errors like using unitialized memory, dereferencing pointers to strange places, writing off the end of blocks allocated using malloc, or failing to free blocks.

You can suppress all of the output except errors using the -q option, like this:

valgrind -q ./my-program arg1 arg2 < test-input

You can also turn on more tests, e.g.

valgrind -q --tool=memcheck --leak-check=yes ./my-program arg1 arg2 < test-input

See valgrind --help for more information about the (many) options, or look at the documentation at http://valgrind.org/ for detailed information about what the output means. For some common valgrind messages, see the examples section below.

89.1. Compilation flags

You can run valgrind on any program (try valgrind ls); it does not require special compilation. However, the output of valgrind will be more informative if you compile your program with debugging information turned on using the -g or -g3 flags (this is also useful if you plan to watch your program running using gdb). See HowToUseTheComputingFacilities for more information about debugging and debugging flags.

89.2. Automated testing

Unless otherwise specified, automated testing of your program will be done using the script in /c/cs223/bin/vg; this runs /c/cs223/bin/valgrind with the --tool=memcheck, --leak-check=yes, and -q options, throws away your program's output, and replaces it with valgrind's output. If you have a program named ./prog, running /c/cs223/bin/vg ./prog should produce no output.

89.3. Examples of some common valgrind errors

Here are some examples of valgrind output. In each case the example program is compiled with -g3 so that valgrind can report line numbers from the source code.

89.3.1. Uninitialized values

Consider this unfortunate program, which attempts to compare two strings, one of which we forgot to ensure was null-terminated:

   1 #include <stdio.h>
   2 
   3 int
   4 main(int argc, char **argv)
   5 {
   6     char a[2];
   7 
   8     a[0] = 'a';
   9 
  10     if(!strcmp(a, "a")) {
  11         puts("a is \"a\"");
  12     }
  13 
  14     return 0;
  15 }
uninitialized.c

Run without valgrind, we see no errors, because we got lucky and it turned out our hand-built string was null-terminated anyway:

$ ./uninitialized 
a is "a"

But valgrind is not fooled:

$ valgrind -q ./uninitialized 
==4745== Conditional jump or move depends on uninitialised value(s)
==4745==    at 0x4026663: strcmp (mc_replace_strmem.c:426)
==4745==    by 0x8048435: main (uninitialized.c:10)
==4745== 
==4745== Conditional jump or move depends on uninitialised value(s)
==4745==    at 0x402666C: strcmp (mc_replace_strmem.c:426)
==4745==    by 0x8048435: main (uninitialized.c:10)
==4745== 
==4745== Conditional jump or move depends on uninitialised value(s)
==4745==    at 0x8048438: main (uninitialized.c:10)
==4745== 

Here we get a lot of errors, but they are all complaining about the same call to strcmp. Since it's unlikely that strcmp itself is buggy, we have to assume that we passed some uninitialized location into it that it is looking at. The fix is to add an assignment a[1] = '\0' so that no such location exists.

89.3.2. Bytes definitely lost

Here is a program that calls malloc but not free:

   1 #include <stdio.h>
   2 #include <stdlib.h>
   3 
   4 int
   5 main(int argc, char **argv)
   6 {
   7     char *s;
   8 
   9     s = malloc(26);
  10 
  11     return 0;
  12 }
missing_free.c

With no extra arguments, valgrind will not look for this error. But if we turn on --leak-check=yes, it will complain:

$ valgrind -q --leak-check=yes ./missing_free
==4776== 26 bytes in 1 blocks are definitely lost in loss record 1 of 1
==4776==    at 0x4024F20: malloc (vg_replace_malloc.c:236)
==4776==    by 0x80483F8: main (missing_free.c:9)
==4776== 

Here the stack trace in the output shows where the bad block was allocated: inside malloc (specifically the paranoid replacement malloc supplied by valgrind), which was in turn called by main in line 9 of missing_free.c. This lets us go back and look at what block was allocated in that line and try to trace forward to see why it wasn't freed. Sometimes this is as simple as forgetting to include a free statement anywhere, but in more complicated cases it may be because I somehow lose the pointer to the block by overwriting the last variable that points to it or by embedding it in some larger structure whose components I forget to free individually.

89.3.3. Invalid write or read operations

These are usually operations that you do off the end of a block from malloc or on a block that has already been freed.

An example of the first case:

   1 #include <stdio.h>
   2 #include <stdlib.h>
   3 #include <assert.h>
   4 
   5 int
   6 main(int argc, char **argv)
   7 {
   8     char *s;
   9 
  10     s = malloc(1);
  11     s[0] = 'a';
  12     s[1] = '\0';
  13 
  14     puts(s);
  15 
  16     return 0;
  17 }
invalid_operations.c

==7141== Invalid write of size 1
==7141==    at 0x804843B: main (invalid_operations.c:12)
==7141==  Address 0x419a029 is 0 bytes after a block of size 1 alloc'd
==7141==    at 0x4024F20: malloc (vg_replace_malloc.c:236)
==7141==    by 0x8048428: main (invalid_operations.c:10)
==7141== 
==7141== Invalid read of size 1
==7141==    at 0x4026063: __GI_strlen (mc_replace_strmem.c:284)
==7141==    by 0x409BCE4: puts (ioputs.c:37)
==7141==    by 0x8048449: main (invalid_operations.c:14)
==7141==  Address 0x419a029 is 0 bytes after a block of size 1 alloc'd
==7141==    at 0x4024F20: malloc (vg_replace_malloc.c:236)
==7141==    by 0x8048428: main (invalid_operations.c:10)
==7141== 

An example of the second:

   1 #include <stdio.h>
   2 #include <stdlib.h>
   3 #include <assert.h>
   4 
   5 int
   6 main(int argc, char **argv)
   7 {
   8     char *s;
   9 
  10     s = malloc(2);
  11     free(s);
  12 
  13     s[0] = 'a';
  14     s[1] = '\0';
  15 
  16     puts(s);
  17 
  18     return 0;
  19 }
freed_block.c

==7144== Invalid write of size 1
==7144==    at 0x804846D: main (freed_block.c:13)
==7144==  Address 0x419a028 is 0 bytes inside a block of size 2 free'd
==7144==    at 0x4024B3A: free (vg_replace_malloc.c:366)
==7144==    by 0x8048468: main (freed_block.c:11)
==7144== 
==7144== Invalid write of size 1
==7144==    at 0x8048477: main (freed_block.c:14)
==7144==  Address 0x419a029 is 1 bytes inside a block of size 2 free'd
==7144==    at 0x4024B3A: free (vg_replace_malloc.c:366)
==7144==    by 0x8048468: main (freed_block.c:11)
==7144== 
==7144== Invalid read of size 1
==7144==    at 0x4026058: __GI_strlen (mc_replace_strmem.c:284)
==7144==    by 0x409BCE4: puts (ioputs.c:37)
==7144==    by 0x8048485: main (freed_block.c:16)
[... more lines of errors deleted ...]

In both cases the problem is that we are operating on memory that is not guaranteed to be allocated to us. For short programs like these, we may get lucky and have the program work anyway. But we still want to avoid bugs like this because we might not get lucky.

How do we know which case is which? If I write off the end of an existing block, I'll see something like Address 0x419a029 is 0 bytes after a block of size 1 alloc'd, telling me that I am working on an address after a block that is still allocated. When I try to write to a freed block, the message changes to Address 0x419a029 is 1 bytes inside a block of size 2 free'd, where the free'd part tells me I freed something I probably shouldn't have. Fixing the first class of bugs is usually just a matter of allocating a bigger block (but don't just do this without figuring out why you need a bigger block, or you'll just be introducing random mutations into your code that may cause other problems elsewhere). Fixing the second class of bugs usually involves figuring out why you freed this block prematurely. In some cases you may need to re-order what you are doing so that you don't free a block until you are completely done with it.

90. Not recommended: debugging output

A tempting but usually bad approach to debugging is to put lots of printf statements in your code to show what is going on. The problem with this compared to using assert is that there is no built-in test to see if the output is actually what you'd expect. The problem compared to gdb is that it's not flexible: you can't change your mind about what is getting printed out without editing the code. A third problem is that the output can be misleading: in particular, printf output is usually buffered, which means that if your program dies suddenly there may be output still in the buffer that is never flushed to stdout. This can be very confusing, and can lead you to believe that your program fails earlier than it actually does.

If you really need to use printf or something like it for debugging output, here are a few rules of thumb to follow to mitigate the worst effects:

  1. Use fprintf(stderr, ...) instead of printf(...); this allows you to redirect your program's regular output somewhere that keeps it separate from the debugging output (but beware of misleading interleaving of the two streams—buffering may mean that output to stdout and stderr appears to arrive out of order). It also helps that output to stderr is usually unbuffered, avoiding the problem of lost output.

  2. If you must output to stdout, put fflush(stdout) after any output operation you suspect is getting lost in the buffer. The fflush function forces any buffered output to be emitted immediately.

  3. Keep all arguments passed to printf as simple as possible and beware of faults in your debugging code itself. If you write printf("a[key] == %d\n", a[key]) and key is some bizarre value, you will never see the result of this printf because your program will segfault while evaluating a[key]. Naturally, this is more likely to occur if the argument is a[key]->size[LEFTOVERS].cleanupFunction(a[key]) than if it's just a[key], and if it happens it will be harder to figure out where in this complex chain of array indexing and pointer dereferencing the disaster happened. Better is to wait for your program to break in gdb, and use the print statement on increasingly large fragments of the offending expression to see where the bogus array index or surprising null pointer is hiding.


CategoryProgrammingNotes

91. TestingDuringDevelopment

It is a truth universally acknowledged that test code should be written early in the development process. Unfortunately, most programmers (including me) tend to assume that a program will work on the first attempt and there's not much point in testing it anyway, so writing and running test code often gets deferred indefinitely. The solution is to write the test code first, and run it directly from your Makefile every time you save and compile your program. Not only will this guarantee that your program actually works when you are done (or at least passes the tests you thought of), it allows you to see how the program is improving with each positive change, and prevents you from accidentally making new negative changes that break things that used to work.

Going on step further, we can often write our interface and test code first, build a non-working stub implementation, and then slowly flesh out the missing pieces until the implementation passes all the tests. This way there is always some obvious step to do next, and we don't find ourselves stuck staring at an empty file.

92. Setting up the test harness

92.1. Module interface

The module will be a stack for storing integers.

Let's start with the interface, which we'll put in a file called stack.h:

92.1.1. stack.h

   1 /*
   2  * This is an "opaque struct"; it discourages people from looking at
   3  * the inside of our structure.  The actual definiton of struct stack
   4  * is contained in stack.c.
   5  */
   6 typedef struct stack *Stack;
   7 
   8 /* constructor and destructor */
   9 Stack stack_create(void);              /* returns 0 on allocation error */
  10 void stack_destroy(Stack);
  11 
  12 /* push a new element onto the stack */
  13 void stack_push(Stack , int new_element);
  14 
  15 /* return 1 if the stack is empty, 0 otherwise */
  16 int stack_isempty(Stack);
  17 
  18 /* remove and return top element of stack */
  19 /* returns STACK_EMPTY if stack is empty */
  20 #define STACK_EMPTY (-1)
  21 int stack_pop(Stack);

Our intent is that an Stack acts like a stack--- we push things onto it using stack_push, and then pull them off again in reverse order using stack_pop. Ideally, we don't ever pop the stack when it's empty (which we can detect using stack_isempty), but if we do, we have stack_pop return something well-defined.

92.2. Test code

Let's write some test code to try this out. Because our initial stack implementation may be exceptionally bug-ridden, we'll use a test harness that provides macros for detecting and intercepting segmentational faults and similar disasters. The various testing wrappers are defined in the files tester.h and tester.c, shown in the Appendix; you should feel free to use it for your own purposes. I've added line numbers in comments to all the TEST lines so we can find them again later.

92.2.1. test-stack.c

   1 #include <stdio.h>
   2 #include <setjmp.h>
   3 #include <signal.h>
   4 #include <unistd.h>
   5 
   6 #include <stdlib.h>
   7 
   8 #include "stack.h"
   9 #include "tester.h"
  10 
  11 #define STRESS_TEST_ITERATIONS (1000000)
  12 
  13 int
  14 main(int argc, char **argv)
  15 {
  16     Stack s;
  17     int i;
  18 
  19     tester_init();
  20 
  21     /* first we need to build one */
  22     TRY { s = stack_create(); } ENDTRY;
  23 
  24     /* 25 */ TEST_ASSERT(s != 0);
  25 
  26     /* now we'll try pushing and popping a bit */
  27     TRY { stack_push(s, 1); } ENDTRY;
  28     TRY { stack_push(s, 2); } ENDTRY;
  29     TRY { stack_push(s, 3); } ENDTRY;
  30 
  31     /* 32 */ TEST(stack_isempty(s), 0);
  32     /* 33 */ TEST(stack_pop(s), 3);
  33     /* 34 */ TEST(stack_isempty(s), 0);
  34     /* 35 */ TEST(stack_pop(s), 2);
  35     /* 36 */ TEST(stack_isempty(s), 0);
  36     /* 37 */ TEST(stack_pop(s), 1);
  37     /* 38 */ TEST(stack_isempty(s), 1);
  38     /* 39 */ TEST(stack_pop(s), STACK_EMPTY);
  39     /* 40 */ TEST(stack_isempty(s), 1);
  40     /* 41 */ TEST(stack_pop(s), STACK_EMPTY);
  41 
  42     /* can we still push after popping too much? */
  43     TRY { stack_push(s, 4); } ENDTRY;
  44     /* 45 */ TEST(stack_isempty(s), 0);
  45     /* 46 */ TEST(stack_pop(s), 4);
  46     /* 47 */ TEST(stack_isempty(s), 1);
  47     /* 48 */ TEST(stack_pop(s), STACK_EMPTY);
  48     /* 49 */ TEST(stack_isempty(s), 1);
  49 
  50     /* let's do some stress testing */
  51     /* we won't use TEST for this because we might get too much output */
  52     TRY {
  53         for(i = 0; i < STRESS_TEST_ITERATIONS; i++) {
  54             stack_push(s, i);
  55         }
  56         for(i = 0; i < STRESS_TEST_ITERATIONS; i++) {
  57             stack_push(s, 957);
  58             if(stack_pop(s) != 957) {
  59                 /* 60 */ FAIL("wanted 957 but didn't get it");
  60                 abort();
  61             }
  62         }
  63         for(i = STRESS_TEST_ITERATIONS - 1; i >= 0; i--) {
  64             if(stack_isempty(s)) {
  65                 /* 66 */ FAIL("stack empty too early");
  66                 abort();
  67             }
  68             if(stack_pop(s) != i) {
  69                 /* 70 */ FAIL("got wrong value!");
  70                 abort();
  71             }
  72         }
  73     } ENDTRY; /* 74 */
  74 
  75     /* 76 */ TEST(stack_isempty(s), 1);
  76 
  77     TRY { stack_destroy(s); } ENDTRY;
  78 
  79     tester_report(stdout, argv[0]);
  80     return tester_result();
  81 }

There is a lot of test code here. In practice, we might write just a few tests to start off with, and, to be honest, I didn't write all of this at once. But you can never have too many tests--- if nothing else, they give an immediate sense of gratification as the number of failed tests drops.

92.3. Makefile

  • Finally, we'll write a Makefile:

92.3.1. Makefile

CC=gcc
CFLAGS=-g3 -Wall -ansi -pedantic

all:

test: test-stack
        ./test-stack 
        @echo OK!

test-stack: test-stack.o tester.o stack.o
        $(CC) $(CFLAGS) -o $@ $^

test-stack.o: stack.h tester.h
stack.o: stack.h

Note that we don't provide a convenient shortcut for building test-stack without running it. That's because we want to run the test code every single time.

93. Stub implementation

Of course, we still can't compile anything, because we don't have any implementation. Let's fix that. To make it easy to write, we will try to add as little as possible to what we already have in stack.h:

93.1. stack.c

   1 #include <stdlib.h>     
   2 #include "stack.h"
   3 
   4 struct stack { int dummy; };
   5 Stack stack_create(void) { return malloc(sizeof(struct stack)); }
   6 void stack_destroy(Stack s) { free(s); }
   7 void stack_push(Stack s, int elem) { ; }
   8 int stack_pop(Stack s) { return STACK_EMPTY; }
   9 int stack_isempty(Stack s) { return 1; }

Will this work? Of course not. There's hardly any code! But maybe it will compile if we run make test:

$ make test
gcc -g3 -Wall -ansi -pedantic   -c -o test-stack.o test-stack.c
gcc -g3 -Wall -ansi -pedantic   -c -o tester.o tester.c
gcc -g3 -Wall -ansi -pedantic   -c -o stack.o stack.c
gcc -g3 -Wall -ansi -pedantic -o test-stack test-stack.o tester.o stack.o
./test-stack 
test-stack.c:32: TEST FAILED: stack_isempty(s) -> 1 but expected 0
test-stack.c:33: TEST FAILED: stack_pop(s) -> -1 but expected 3
test-stack.c:34: TEST FAILED: stack_isempty(s) -> 1 but expected 0
test-stack.c:35: TEST FAILED: stack_pop(s) -> -1 but expected 2
test-stack.c:36: TEST FAILED: stack_isempty(s) -> 1 but expected 0
test-stack.c:37: TEST FAILED: stack_pop(s) -> -1 but expected 1
test-stack.c:45: TEST FAILED: stack_isempty(s) -> 1 but expected 0
test-stack.c:46: TEST FAILED: stack_pop(s) -> -1 but expected 4
test-stack.c:60: wanted 957 but didn't get it
test-stack.c:74: Aborted (signal 6)
./test-stack: errors 8/17, signals 1, FAILs 1
make[1]: *** [test] Error 8

Hooray! It compiles on the first try! (Well, not really, but let's pretend it did.) Unfortunately, it only passes any tests at all by pure dumb luck. But now we just need to get the code to pass a few more tests.

94. Bounded-space implementation

Here's a first attempt at a stack that suffers from some artificial limits. We retain the structure of the original broken implementation, we just put a few more lines of code in and format it more expansively.

94.1. stack.c

   1 #include <stdlib.h>     
   2 #include "stack.h"
   3 
   4 #define MAX_STACK_SIZE (100)
   5 
   6 struct stack { 
   7     int top;
   8     int data[MAX_STACK_SIZE];
   9 };
  10 
  11 Stack 
  12 stack_create(void)
  13 {
  14     struct stack *s;
  15     
  16     s = malloc(sizeof(*s));
  17     s->top = 0;
  18     return s;
  19 }
  20 
  21 void
  22 stack_destroy(Stack s)
  23 {
  24     free(s);
  25 }
  26 
  27 void
  28 stack_push(Stack s, int elem)
  29 {
  30     s->data[(s->top)++] = elem;
  31 }
  32 
  33 int
  34 stack_pop(Stack s)
  35 {
  36     return s->data[--(s->top)];
  37 }
  38 
  39 int
  40 stack_isempty(Stack s)
  41 {
  42     return s->top == 0;
  43 }

Let's see what happens now:

$ make test
gcc -g3 -Wall -ansi -pedantic   -c -o test-stack.o test-stack.c
gcc -g3 -Wall -ansi -pedantic   -c -o tester.o tester.c
gcc -g3 -Wall -ansi -pedantic   -c -o stack.o stack.c
gcc -g3 -Wall -ansi -pedantic -o test-stack test-stack.o tester.o stack.o
./test-stack 
test-stack.c:40: TEST FAILED: stack_isempty(s) -> 0 but expected 1
test-stack.c:41: TEST FAILED: stack_pop(s) -> 409 but expected -1
test-stack.c:47: TEST FAILED: stack_isempty(s) -> 0 but expected 1
test-stack.c:48: TEST FAILED: stack_pop(s) -> 0 but expected -1
test-stack.c:49: TEST FAILED: stack_isempty(s) -> 0 but expected 1
test-stack.c:74: Segmentation fault (signal 11)
test-stack.c:76: TEST FAILED: stack_isempty(s) -> 0 but expected 1
free(): invalid pointer 0x804b830!
./test-stack: errors 6/17, signals 1, FAILs 0
make[1]: *** [test] Error 6

There are still errors, but we get past several initial tests before things blow up. Looking back at the line numbers in test-stack.c, we see that the first failed test is the one that checks if the stack is empty after we pop from an empty stack. The code for stack_isempty looks pretty clean, so what happened? Somewhere s->top got set to a nonzero value, and the only place this can happen is inside stack_pop. Aha! There's no check in stack_pop for an empty stack, so it's decrementing s->top past 0. (Exercise: why didn't the test of stack_pop fail?)

95. First fix

If we're lucky, fixing this problem will make the later tests happier. Let's try a new version of stack_pop. We'll leave everything else the same.

   1 int
   2 stack_pop(Stack s)
   3 {
   4     if(stack_isempty(s)) {
   5         return STACK_EMPTY;
   6     } else {
   7         return s->data[--(s->top)];
   8     }

}

And now we get:

$ make test
gcc -g3 -Wall -ansi -pedantic   -c -o test-stack.o test-stack.c
gcc -g3 -Wall -ansi -pedantic   -c -o tester.o tester.c
gcc -g3 -Wall -ansi -pedantic   -c -o stack.o stack.c
gcc -g3 -Wall -ansi -pedantic -o test-stack test-stack.o tester.o stack.o
./test-stack 
test-stack.c:74: Segmentation fault (signal 11)
test-stack.c:76: TEST FAILED: stack_isempty(s) -> 0 but expected 1
./test-stack: errors 1/17, signals 1, FAILs 0
make[1]: *** [test] Error 1

Which is much nicer. We are still failing the stress test, but that's not terribly surprising.

96. Final version

After some more tinkering, this is what I ended up with. This version uses a malloc'd data field, and realloc's it when the stack gets too big.

96.1. stack.c

   1 #include <stdlib.h>     
   2 #include "stack.h"
   3 
   4 struct stack { 
   5     int top;    /* first unused slot in data */
   6     int size;   /* number of slots in data */
   7     int *data;  /* stack contents */
   8 };
   9 
  10 #define INITIAL_STACK_SIZE (1)
  11 #define STACK_SIZE_MULTIPLIER (2)
  12 
  13 Stack 
  14 stack_create(void)
  15 {
  16     struct stack *s;
  17 
  18     s = malloc(sizeof(*s));
  19     if(s == 0) return 0;
  20     
  21     s->top = 0;
  22     s->size = INITIAL_STACK_SIZE;
  23     s->data = malloc(s->size * sizeof(*(s->data)));
  24     if(s->data == 0) return 0;
  25 
  26     /* else everything is ok */
  27     return s;
  28 }
  29 
  30 void
  31 stack_destroy(Stack s)
  32 {
  33     free(s->data);
  34     free(s);
  35 }
  36 
  37 void
  38 stack_push(Stack s, int elem)
  39 {
  40     if(s->top == s->size) {
  41         /* need more space */
  42         s->size *= STACK_SIZE_MULTIPLIER;
  43         s->data = realloc(s->data, s->size * sizeof(*(s->data)));
  44         if(s->data == 0) {
  45             abort();    /* we have no other way to signal failure :-( */
  46         }
  47     }
  48     /* now there is enough room */
  49     s->data[s->top++] = elem;
  50 }
  51 
  52 int
  53 stack_pop(Stack s)
  54 {
  55     if(stack_isempty(s)) {
  56         return STACK_EMPTY;
  57     } else {
  58         return s->data[--(s->top)];
  59     }
  60 }
  61 
  62 int
  63 stack_isempty(Stack s)
  64 {
  65     return s->top == 0;
  66 }

At last we have a version that passes all tests:

$ make test
gcc -g3 -Wall -ansi -pedantic   -c -o test-stack.o test-stack.c
gcc -g3 -Wall -ansi -pedantic   -c -o tester.o tester.c
gcc -g3 -Wall -ansi -pedantic   -c -o stack.o stack.c
gcc -g3 -Wall -ansi -pedantic -o test-stack test-stack.o tester.o stack.o
./test-stack 
OK!

97. Moral

Writing a big program all at once is hard. If you can break the problem down into little problems, it becomes easier. "Test first" is a strategy not just for getting a well-tested program, but for giving you something easy to do at each step--- it's usually not too hard to write one more test, and it's usually not too hard to get just one test working. If you can keep taking those small, easy steps, eventually you will run out of failed tests and have a working program.

98. Appendix: Test macros

98.1. tester.h

   1 /*
   2  * Test macros.
   3  * 
   4  * Usage:
   5  *
   6  * #include <setjmp.h>
   7  * #include <stdio.h>
   8  * #include <signal.h>
   9  * #include <unistd.h>
  10  *
  11  * tester_init();                  -- Initialize internal data structures.
  12  * tester_report(FILE *, "name");  -- Print report.
  13  * tester_result();                -- Returns # of failed tests.
  14  *
  15  * TRY { code } ENDTRY;
  16  *
  17  * Wraps code to catch seg faults, illegal instructions, etc.  May not be
  18  * nested.
  19  * Prints a warning if a signal is caught.
  20  * To enforce a maximum time, set alarm before entering.
  21  *
  22  * TEST(expr, expected_value);
  23  *
  24  * Evaluates expr (which should yield an integer value) inside a TRY.
  25  * Prints a warning if evaluating expr causes a fault or returns a value
  26  * not equal to expected_value.
  27  *
  28  * TEST_ASSERT(expr)
  29  *
  30  * Equivalent to TEST(!(expr), 0)
  31  *
  32  * You can also cause your own failures with FAIL:
  33  *
  34  * TRY {
  35  *     x = 1;
  36  *     if(x == 2) FAIL("why is x 2?");
  37  * } ENDTRY;
  38  *
  39  * To limit the time taken by a test, call tester_set_time_limit with
  40  * a new limit in seconds, e.g.
  41  *
  42  * tester_set_time_limit(1);
  43  * TRY { while(1); } ENDTRY;
  44  *
  45  * There is an initial default limit of 10 seconds.
  46  * If you don't want any limit, set the limit to 0.
  47  *
  48  */
  49 
  50 /* global data used by macros */
  51 /* nothing in here should be modified directly */
  52 extern struct tester_global_data {
  53     jmp_buf escape_hatch;       /* jump here on surprise signals */
  54     int escape_hatch_active;    /* true if escape hatch is usable */
  55     int tests;                  /* number of tests performed */
  56     int errors;                 /* number of tests failed */
  57     int signals;                /* number of signals caught */
  58     int expr_value;             /* expression value */
  59     int setjmp_return;          /* return value from setjmp */
  60     int try_failed;             /* true if last try failed */
  61     int user_fails;             /* number of calls to FAIL */
  62     int time_limit;             /* time limit for TRY */
  63 } TesterData;
  64 
  65 /* set up system; call this before using macros */
  66 void tester_init(void);
  67 
  68 /* prints a summary report of all errors to f, prefixed with preamble */
  69 /* If there were no errors, nothing is printed */
  70 void tester_report(FILE *f, const char *preamble);
  71 
  72 /* returns number of errors so far. */
  73 int tester_result(void);
  74 
  75 /* set a time limit t for TRY, TEST, TEST_ASSERT etc. */
  76 /* After t seconds, an ALARM signal will interrupt the test. */
  77 /* Set t = 0 to have no time limit. */
  78 /* Default time limit is 10 seconds. */
  79 void tester_set_time_limit(int t);        
  80 
  81 const char *tester_strsignal(int);      /* internal hack; don't use this */
  82 
  83 /* gruesome non-syntactic macros */
  84 #define TRY \
  85     TesterData.try_failed = 0; \
  86     alarm(TesterData.time_limit); \
  87     if(((TesterData.setjmp_return = setjmp(TesterData.escape_hatch)) == 0) \
  88             && (TesterData.escape_hatch_active = 1) /* one = is correct*/)
  89 #define ENDTRY else { \
  90         fprintf(stderr, "%s:%d: %s (signal %d)\n", \
  91             __FILE__, __LINE__, \
  92             tester_strsignal(TesterData.setjmp_return), \
  93             TesterData.setjmp_return); \
  94         TesterData.signals++; \
  95         TesterData.try_failed = 1; \
  96     } \
  97     alarm(0); \
  98     TesterData.escape_hatch_active = 0
  99 
 100 /* another atrocity */
 101 #define TEST(expr, expected_value) \
 102     TesterData.tests++; \
 103     TesterData.errors++; /* guilty until proven innocent */ \
 104     TRY { TesterData.expr_value = (expr); \
 105         if(TesterData.expr_value != expected_value) { \
 106             fprintf(stderr, "%s:%d: TEST FAILED: %s -> %d but expected %d\n", \
 107                     __FILE__, __LINE__, __STRING(expr), \
 108                     TesterData.expr_value, expected_value); \
 109         } else { \
 110             TesterData.errors--; \
 111         } \
 112     } \
 113     ENDTRY; \
 114     if(TesterData.try_failed) \
 115         fprintf(stderr, "%s:%d: TEST FAILED: %s caught signal\n", \
 116                 __FILE__, __LINE__, __STRING(expr))
 117 
 118 #define TEST_ASSERT(expr) TEST((expr) != 0, 1)
 119 #define FAIL(msg) \
 120     (fprintf(stderr, "%s:%d: %s\n", __FILE__, __LINE__, (msg)), \
 121              TesterData.user_fails++, \
 122              TesterData.try_failed = 1)
 123 

98.2. tester.c

   1 #define _GNU_SOURCE     /* get strsignal def */
   2 
   3 #include <stdio.h>
   4 #include <signal.h>
   5 #include <string.h>
   6 #include <setjmp.h>
   7 
   8 #include "tester.h"
   9 
  10 struct tester_global_data TesterData;
  11 
  12 const char *
  13 tester_strsignal(int sig)
  14 {
  15     return strsignal(sig);
  16 }
  17 
  18 static void
  19 tester_sighandler(int signal)
  20 {
  21     if(TesterData.escape_hatch_active) {
  22         TesterData.escape_hatch_active = 0;
  23         longjmp(TesterData.escape_hatch, signal);
  24     }
  25 }
  26 
  27 void
  28 tester_init(void)
  29 {
  30     TesterData.escape_hatch_active = 0;
  31     TesterData.tests = 0;
  32     TesterData.errors = 0;
  33     TesterData.signals = 0;
  34     TesterData.user_fails = 0;
  35 
  36     signal(SIGSEGV, tester_sighandler);
  37     signal(SIGILL, tester_sighandler);
  38     signal(SIGFPE, tester_sighandler);
  39     signal(SIGALRM, tester_sighandler);
  40     signal(SIGBUS, tester_sighandler);
  41     signal(SIGABRT, tester_sighandler);
  42 }
  43 
  44 void
  45 tester_report(FILE *f, const char *preamble)
  46 {
  47     if(TesterData.errors != 0 || TesterData.signals != 0) {
  48         fprintf(f, "%s: errors %d/%d, signals %d, FAILs %d\n", 
  49                 preamble,
  50                 TesterData.errors, 
  51                 TesterData.tests,
  52                 TesterData.signals,
  53                 TesterData.user_fails);
  54     }
  55 }
  56 
  57 int
  58 tester_result(void)
  59 {
  60     return TesterData.errors;
  61 }
  62 
  63 void
  64 tester_set_time_limit(int t)
  65 {
  66     TesterData.time_limit = t;
  67 }


CategoryProgrammingNotes

99. LinkedLists

Linked lists are about the simplest data structure beyond arrays. They aren't very efficient for many purposes, but have very good performance for certain specialized applications.

100. Stacks and linked lists

On my desk is a pile of books that I am supposed to put away on my bookshelf. I don't care much about how they are organized, I just want to be able to dump a book quickly so that I can later go through and put all of them back at once. Because it's easiest just to dump each book on the top of the pile, I have effectively built a data structure called a stack, which supports a push operation (add a book to the top of the pile) and a pop operation (take the top book off and return it).

To build a stack in a program, I can adopt a similar strategy. When a new item appears, I will box it up inside a struct that will also include a pointer to the item below it in the stack. Since that item will point to the item below it, and so on, I end up with an arbitrarily long chain of pointers (usually ending in a null pointer).

The cost of push and pop operations are both O(1): they don't depend on the number of elements in the stack, since they are only working on the top element or two. Doing almost anything else (e.g., finding the k-th element of the stack or searching for a particular value) is likely to be much more expensive, O(n) in the worst case. The reason is that unlike an array, a linked list scatters its contents throughout memory, and the only way to get at a particular element is to crawl through all the ones that precede it.

100.1. Implementation

A very concrete implementation of a stack using a linked list might look like this:

   1 #include <stdlib.h>
   2 #include <string.h>
   3 
   4 /* Functional-style stack */
   5 /* All operations consume the old value and return the new value of the stack */
   6 
   7 typedef struct stack {
   8     char *book;         /* malloc'd name of book */
   9     struct stack *next; /* next item in stack, or 0 if there aren't any more */
  10 } *Stack;
  11 
  12 #define EMPTY_STACK (0)
  13 
  14 /* create a new empty stack */
  15 Stack
  16 stackCreate(void)
  17 {
  18     return EMPTY_STACK;
  19 }
  20 
  21 /* push a new book on the stack */
  22 /* copies second argument book (so you don't need to keep it around) */
  23 /* returns 0 on allocation error */
  24 Stack
  25 stackPush(Stack old, const char *book)
  26 {
  27     Stack new;
  28 
  29     new = malloc(sizeof(*new));
  30     if(new == 0) return 0;
  31 
  32     new->next = old;
  33     new->book = strdup(book);
  34 
  35     if(new->book == 0) {
  36         free(new);
  37         return 0;
  38     }
  39 
  40     return new;
  41 }
  42 
  43 /* pops a book off the stack */
  44 /* returns new tail of stack, stores book in *book */
  45 /* *book is malloc'd data that the caller should free */
  46 /* Stores 0 in *book if stack is empty */
  47 Stack
  48 stackPop(Stack old, char **book /*RETVAL*/)
  49 {
  50     Stack new;
  51 
  52     if(old == 0) {
  53         *book = 0;
  54         return 0;
  55     } else {
  56         new = old->next;
  57         *book = old->book;
  58         free(old);
  59         return new;
  60     }
  61 }
  62 
  63 /* frees a stack */
  64 void
  65 stackDestroy(Stack s)
  66 {
  67     char *book;
  68 
  69     while(s) {
  70         s = stackPop(s, &book);
  71         free(book);
  72     }
  73 }
functional_stack.c

The meat of this implementation is in stackPush and stackPop. These act like a constructor/destructor pair for individual stack elements. The stackPush funciton also does the work of linking the new element into the stack and copying the book argument. The choice to copy book is a design decision: it insulates the contents of the stack from any changes that might happen to book after stackPush exits (perhaps book is a buffer that is reused to read a new line from the input after each call to stackPush), but it may be more than what the user needs, and it forces the user to free the returned value in stackPop. An alternative would be to let the user call strdup for themselves.

100.2. A more complicated implementation

The simple implementation of a stack above doesn't really act like a single object; instead, it acts more like a family of constant values you can do arithmetic on, where applying stackPush or stackPop consumes some old value and generates a new one. For many applications, it is more natural to just think of having a single stack that changes over time.

We can do this with another layer of indirection. Instead of having a Stack be a pointer to the first element, we'll have it be a pointer to a pointer to the first element. This requires a little bit more work inside the implementation but looks nicer from the outside.

   1 #include <stdlib.h>
   2 #include <string.h>
   3 
   4 /* Imperative-style stack */
   5 /* All operations modify the stack in place */
   6 
   7 /* a Stack is a pointer to a pointer to the first element of a linked list */
   8 typedef struct stack {
   9     char *book;         /* malloc'd name of book */
  10     struct stack *next; /* next item in stack, or 0 if there aren't any more */
  11 } **Stack;
  12 
  13 /* create a new empty stack */
  14 Stack
  15 stackCreate(void)
  16 {
  17     Stack s;
  18     
  19     s = malloc(sizeof(struct stack *));
  20     *s = 0;
  21 
  22     return s;
  23 }
  24 
  25 /* push a new book on the stack */
  26 /* copies second argument book (so you don't need to keep it around) */
  27 /* returns 0 on allocation error or 1 on success */
  28 int
  29 stackPush(Stack s, const char *book)
  30 {
  31     struct stack *new;
  32 
  33     new = malloc(sizeof(*new));
  34 
  35     new->next = *s;
  36     new->book = strdup(book);
  37 
  38     if(new->book == 0) {
  39         free(new);
  40         return 0;
  41     }
  42 
  43     *s = new;
  44     return 1;
  45 }
  46 
  47 /* pops a book off the stack */
  48 /* returns 0 if stack is empty */
  49 char *
  50 stackPop(Stack s)
  51 {
  52     struct stack *new;
  53     char *book;
  54 
  55     if(*s == 0) {
  56         return 0;
  57     } else {
  58         book = (*s)->book;
  59 
  60         /* we have to save (*s)->next before we free it */
  61         new = (*s)->next;
  62         free(*s);
  63         *s = new;
  64         return book;
  65     }
  66 }
  67 
  68 /* returns true if s is empty */
  69 int
  70 stackEmpty(Stack s)
  71 {
  72     return (*s) == 0;
  73 }
  74 
  75 /* frees a stack */
  76 void
  77 stackDestroy(Stack s)
  78 {
  79     while(!stackEmpty(s)) {
  80         free(stackPop(s));
  81     }
  82 
  83     free(s);
  84 }
object_stack.c

Here we have added a new stackEmpty routine because it's no longer obvious how to check (and we might someday want to change Stack to some other type where testing (*s) == 0 is no longer the right way to do it).

Note that the structure of the linked list is exactly the same in both implementations: the only difference is that the "imperative" version adds an extra level of indirection at the beginning. Yet another alternative is to have a dummy stack element at the beginning and do pushes and pops starting with the second place in the stack (i.e., replace (*s) with s->next in the second implementation). This consumes more space (we end up allocating space to hold s->book even though we don't put anything there), but can make things simpler for some more complicated linked-list structures.

100.3. Building a stack out of an array

When the elements of a stack are small, or when a maximum number of elements is known in advance, it often makes sense to build a stack from an array (with a variable storing the index of the top element) instead of a linked list. The reason is that pushes and pops only require updating the stack pointer instead of calling malloc or free to allocate space, and pre-allocating is almost always faster than allocating as needed. This is the strategy used to store the function call stack in almost all programs (the exception is in languages like Scheme, where the call stack is allocated on the heap because stack frames may outlive the function call that creates them).

101. Looping over a linked list

Looping over a linked list is not hard if you have access to the next pointers. (For a more abstract way to do this see C/Iterators.)

Let's imagine somebody gave us a pointer to the first struct stack in a list; call this pointer first. Then we can write a loop like this that prints the contents of the stack:

   1 void
   2 stackPrint(struct stack *first)
   3 {
   4     struct stack *elt;
   5 
   6     for(elt = first; elt != 0; elt = elt->next) {
   7         puts(elt->book);
   8     }
   9 }

There's not a whole lot to notice here except that for is perfectly happy to iterate over something that isn't a range of integers. The running time is linear in the length of the list (O(n)).

102. Looping over a linked list backwards

What if we want to loop over a linked list backwards? The next pointers all go the wrong way, so we have to save a trail of breadcrumbs to get back. The safest way to do this is to reverse the original list into an auxiliary list:

   1 void
   2 stackPrintReversed(struct stack *first)
   3 {
   4     struct stack *elt;
   5     Stack s2;                   /* uses imperative implementation */
   6 
   7     s2 = stackCreate();
   8 
   9     for(elt = first; elt != 0; elt = elt->next) {
  10         s2 = stackPush(s2, elt->book);
  11     }
  12 
  13     stackPrint(s2);
  14     stackDestroy(s2);
  15 }

Pushing all the elements from the first list onto s2 puts the first element on the bottom, so when we print s2 out, it's in the reverse order of the original stack.

We can also write a recursive function that prints the elements backwards. This function effectively uses the function call stack in place of the extra stack s2 above.

   1 void
   2 stackPrintReversedRecursive(struct stack *first)
   3 {
   4     if(first != 0) {
   5         /* print the rest of the stack */
   6         stackPrintReversedRecursive(first->next);
   7 
   8         /* then print the first element */
   9         puts(first->book);
  10     }
  11 }

The code in stackPrintReversedRecursive is shorter than the code in stackPrintReversed, and it is likely to be faster since it doesn't require allocating a second stack and copying all the elements. But it will only work for small stacks: because the function call stack is really a fixed-size array, if the input to stackPrintReversedRecursive is too big the recursion will go too deep cause a stack overflow.

If we want to do this sort of thing a lot, we should build a doubly-linked list, with a pointer in each element both to the next element and the previous element instead of a singly-linked list (see below for more).

103. Queues

Stacks are last-in-first-out (LIFO) data structures: when we pop, we get the last item we pushed. What if we want a first-in-first-out (FIFO) data structure? Such a data structure is called a queue and can also be implemented by a linked list. The difference is that if we want O(1) time for both the enqueue (push) and dequeue (pop) operations, we must keep around pointers to both ends of the linked list.

Here is a simple queue holding ints:

   1 #include <stdlib.h>
   2 
   3 struct elt {
   4     int item;
   5     struct elt *next;
   6 };
   7 
   8 struct queue {
   9     struct elt *head;   /* first element in queue, or 0 if empty */
  10     struct elt *tail;   /* last element in queue */
  11 };
  12 
  13 typedef struct queue *Queue;
  14 
  15 Queue
  16 queueCreate(void)
  17 {
  18     Queue q;
  19 
  20     q = malloc(sizeof(*q));
  21 
  22     if(q) {
  23         q->head = 0;
  24     }
  25 
  26     return q;
  27 }
  28 
  29 void
  30 enqueue(Queue q, int item)
  31 {
  32     struct elt *e;
  33 
  34     e = malloc(sizeof(*e));
  35     if(e == 0) abort();
  36 
  37     e->item = item;
  38     e->next = 0;
  39     
  40     if(q->head == 0) {
  41         /* special case for empty queue */
  42         q->head = q->tail = e;
  43     } else {
  44         /* patch e in after tail */
  45         q->tail->next = e;
  46         q->tail = e;
  47     }
  48 }
  49 
  50 int
  51 queueEmpty(Queue q)
  52 {
  53     return q->head == 0;
  54 }
  55 
  56 #define EMPTY_QUEUE (-1)
  57 
  58 /* returns EMPTY_QUEUE if queue is empty */
  59 int
  60 dequeue(Queue q)
  61 {
  62     struct elt *e;
  63     int retval;
  64 
  65     if(queueEmpty(q)) {
  66         return EMPTY_QUEUE;
  67     } else {
  68         /* pop first element off */
  69         e = q->head;
  70         q->head = e->next;
  71 
  72         /* save its contents and free it */
  73         retval = e->item;
  74         free(e);
  75 
  76         return retval;
  77     }
  78 }
queue.c

It is a bit trickier to build a queue out of an array than to build a stack. The difference is that while a stack pointer can move up and down, leaving the base of the stack in the same place, a naive implementation of a queue would have head and tail pointers both marching ever onward across the array leaving nothing but empty cells in their wake. While it is possible to have the pointers wrap around to the beginning of the array when they hit the end, if the queue size is unbounded the tail pointer will eventually catch up to the head pointer. At this point (as in a stack that overflows), it is necessary to allocate more space and copy the old elements over.

104. Deques and doubly-linked lists

Suppose we want a data structure that represents a line of elements where we can push or pop elements at either end. Such a data structure is known as a deque (pronounced like "deck"), and can be implemented with all operations taking O(1) time by a doubly-linked list, where each element has a pointer to both its successor and its predecessor.

An ordinary singly-linked list is not good enough. The reason is that even if we keep a pointer to both ends as in a queue, when it comes time to pop an element off the tail, we have no pointer to its predecessor ready to hand; the best we can do is scan from the head until we get to an element whose successor is the tail, which takes O(n) time.

The solution is to build something like the code below. To avoid special cases, we use a lot of 2-element arrays of pointers, so that pushing or popping from either end of the queue is completely symmetric.

   1 typedef struct deque *Deque;
   2 
   3 #define FRONT (0)
   4 #define BACK (1)
   5 
   6 #define NUM_DIRECTIONS (2)
   7 
   8 #define EMPTY_QUEUE (-1)
   9 
  10 Deque dequeCreate(void);
  11 
  12 void dequePush(Deque d, int direction, int value);
  13 
  14 int dequePop(Deque d, int direction);
  15 
  16 int dequeEmpty(Deque d);
  17 
  18 void dequeDestroy(Deque d);
deque.h

   1 #include <stdlib.h>
   2 #include <assert.h>
   3 
   4 #include "deque.h"
   5 
   6 struct elt {
   7     int value;
   8     struct elt *next[NUM_DIRECTIONS];
   9 };
  10 
  11 struct deque {
  12     struct elt *head[NUM_DIRECTIONS];
  13 };
  14 
  15 Deque
  16 dequeCreate(void)
  17 {
  18     Deque d;
  19 
  20     d = malloc(sizeof(*d));
  21     if(d) {
  22         d->head[FRONT] = d->head[BACK] = 0;
  23     } 
  24 
  25     return d;
  26 }
  27 
  28 void
  29 dequePush(Deque d, int direction, int value)
  30 {
  31     struct elt *e;
  32 
  33     assert(direction == FRONT || direction == BACK);
  34 
  35     e = malloc(sizeof(*e));
  36     if(e == 0) abort();  /* kill process on malloc failure */
  37     
  38     e->next[direction] = d->head[direction];
  39     e->next[!direction] = 0;
  40     e->value = value;
  41 
  42     if(d->head[direction]) {
  43         d->head[direction]->next[!direction] = e;
  44     }
  45 
  46     d->head[direction] = e;
  47 
  48     /* special case for empty deque */
  49     /* e becomes the head on both sides */
  50     if(d->head[!direction] == 0) {
  51         d->head[!direction] = e;
  52     }
  53 }
  54 
  55 int
  56 dequePop(Deque d, int direction)
  57 {
  58     struct elt *e;
  59     int retval;
  60 
  61     assert(direction == FRONT || direction == BACK);
  62 
  63     e = d->head[direction];
  64 
  65     if(e == 0) return EMPTY_QUEUE;
  66 
  67     /* else remove it */
  68     d->head[direction] = e->next[direction];
  69 
  70     if(d->head[direction] != 0) {
  71         d->head[direction]->next[!direction] = 0;
  72     } else { 
  73         /* special case when e is the last element */
  74         /* clear other head field */
  75         d->head[!direction] = 0;
  76     }
  77 
  78     retval = e->value;
  79 
  80     free(e);
  81 
  82     return retval;
  83 }
  84 
  85 int
  86 dequeEmpty(Deque d)
  87 {
  88     return d->head[FRONT] == 0;
  89 }
  90 
  91 void
  92 dequeDestroy(Deque d)
  93 {
  94     while(!dequeEmpty(d)) {
  95         dequePop(d, FRONT);
  96     }
  97 
  98     free(d);
  99 }
deque.c

And here is some test code: test_deque.c, Makefile.

105. Circular linked lists

For some applications, there is no obvious starting or ending point to a list, and a circular list (where the last element points back to the first) may be appropriate. Circular doubly-linked lists can also be used to build deques; a single pointer into the list tracks the head of the deque, with some convention adopted for whether the head is an actual element of the list (at the front, say, with its left neighbor at the back) or a dummy element that is not considered to be part of the list.

The selling point of circular doubly-linked lists as a concrete data structure is that insertions and deletions can be done anywhere in the list with only local information. For example, here are some routines for manipulating a doubly-linked list directly. We'll make our lives easy and assume (for the moment) that the list has no actual contents to keep track of.

   1 #include <stdlib.h>
   2 
   3 /* directions for doubly-linked list next pointers */
   4 #define RIGHT (0)
   5 #define LEFT (1)
   6 
   7 struct elt {
   8     struct elt *next[2];
   9 };
  10 
  11 typedef struct elt *Elt;
  12 
  13 /* create a new circular doubly-linked list with 1 element */
  14 /* returns 0 on allocation error */
  15 Elt
  16 listCreate(void)
  17 {
  18     Elt e;
  19 
  20     e = malloc(sizeof(*e));
  21     if(e) {
  22         e->next[LEFT] = e->next[RIGHT] = e;
  23     }
  24 
  25     return e;
  26 }
  27 
  28 /* remove an element from a list */
  29 /* Make sure you keep a pointer to some other element! */
  30 /* does not free the removed element */
  31 void
  32 listRemove(Elt e)
  33 {
  34     /* splice e out */
  35     e->next[RIGHT]->next[LEFT] = e->next[LEFT];
  36     e->next[LEFT]->next[RIGHT] = e->next[RIGHT];
  37 }
  38     
  39 /* insert an element e into list in direction dir from head */
  40 void
  41 listInsert(Elt head, int dir, Elt e)
  42 {
  43     /* fill in e's new neighbors */
  44     e->next[dir] = head->next[dir];
  45     e->next[!dir] = head;
  46 
  47     /* make neigbhors point back at e */
  48     e->next[dir]->next[!dir] = e;
  49     e->next[!dir]->next[dir] = e;
  50 }
  51 
  52 /* split a list, removing all elements between e1 and e2 */
  53 /* e1 is the leftmost node of the removed subsequence, e2 rightmost */
  54 /* the removed elements are formed into their own linked list */
  55 /* comment: listRemove could be implemented as listSplit(e,e) */
  56 void
  57 listSplit(Elt e1, Elt e2)
  58 {
  59     /* splice out the new list */
  60     e2->next[RIGHT]->next[LEFT] = e1->next[LEFT];
  61     e1->next[LEFT]->next[RIGHT] = e2->next[RIGHT];
  62 
  63     /* fix up the ends */
  64     e2->next[RIGHT] = e1;
  65     e1->next[LEFT] = e2;
  66 }
  67 
  68 /* splice a list starting at e2 after e1 */
  69 /* e2 becomes e1's right neighbor */
  70 /* e2's left neighbor becomes left neighbor of e1's old right neighbor */
  71 void
  72 listSplice(Elt e1, Elt e2)
  73 {
  74     /* fix up tail end */
  75     e2->next[LEFT]->next[RIGHT] = e1->next[RIGHT];
  76     e1->next[RIGHT]->next[LEFT] = e2->next[LEFT];
  77 
  78     /* fix up e1 and e2 */
  79     e1->next[RIGHT] = e2;
  80     e2->next[LEFT] = e1;
  81 }
  82 
  83 /* free all elements of the list containing e */
  84 void
  85 listDestroy(Elt e)
  86 {
  87     Elt target;
  88     Elt next;
  89 
  90     /* we'll free elements until we get back to e, then free e */
  91     /* note use of pointer address comparison to detect end of loop */
  92     for(target = e->next[RIGHT]; target != e; target = next) {
  93         next = target->next[RIGHT];
  94         free(target);
  95     }
  96 
  97     free(e);
  98 }
circular.c

The above code might or might not actually work. What if it doesn't? It may make sense to include some sanity-checking code that we can run to see if our pointers are all going to the right place:

   1 /* assert many things about correctness of the list */
   2 /* Amazingly, this is guaranteed to abort or return no matter
   3    how badly screwed up the list is. */
   4 void
   5 listSanityCheck(Elt e)
   6 {
   7     Elt check;
   8 
   9     assert(e != 0);
  10 
  11     check = e;
  12 
  13     do {
  14 
  15         /* are our pointers consistent with our neighbors? */
  16         assert(check->next[RIGHT]->next[LEFT] == check);
  17         assert(check->next[LEFT]->next[RIGHT] == check);
  18 
  19         /* on to the next */
  20         check = check->next[RIGHT];
  21 
  22     } while(check != e);
  23 }

What if we want to store something in this list? The simplest approach is to extend the definition of struct elt:

   1 struct elt {
   2     struct elt *next[2];
   3     char *name;
   4     int socialSecurityNumber;
   5     int gullibility;
   6 };

But then we can only use the code for one particular type of data. An alternative approach is to define a new Elt-plus struct:

   1 struct fancyElt {
   2     struct elt *next[2];
   3     char *name;
   4     int socialSecurityNumber;
   5     int gullibility;
   6 };

and then use pointer casts to pretend the fancy structs into Elts:

   1     struct fancyElt *e;
   2 
   3     e = malloc(sizeof(*e));
   4 
   5     /* fill in fields on e */
   6 
   7     listInsert(someList, (Elt) e);

The trick here is that as long as the initial part of the struct fancyElt looks like a struct elt, any code that expects a struct elt will happily work with it and ignore the fields that happen to be sitting later in memory. (This trick is how C++ inheritance works.)

The downside is that if something needs to be done with the other fields (e.g freeing e->name if e is freed), then the Elt functions won't know to do this. So if you use this trick you should be careful.

A similar technique using void * pointers is described in C/GenericContainers.

106. What linked lists are and are not good for

Linked lists are good for any task that involves inserting or deleting elements next to an element you already have a pointer to; such operations can usually be done in O(1) time. They generally beat arrays (even resizeable arrays) if you need to insert or delete in the middle of a list, since an array has to copy any elements above the insertion point to make room; if inserts or deletes always happen at the end, an array may be better.

Linked lists are not good for any operation that requires random access, since reaching an arbitrary element of a linked list takes as much as O(n) time. For such applications, arrays are better if you don't need to insert in the middle; if you do, you should use some sort of tree (see BinaryTrees).

107. Further reading

KernighanPike gives an example of a linked list in Section 2.7. A description of many different kinds of linked lists with pictures can be found at Linked_list.


CategoryProgrammingNotes

108. HashTables

For an updated version of these notes, see http://www.cs.yale.edu/homes/aspnes/classes/469/notes.pdf. For practical implementation of hash tables in C, see C/HashTables.

These are theoretical notes on hashing based largely on MotwaniRaghavan §§8.4–8.5 (which is in turn based on work of Carter and Wegman on universal hashing and Fredman, Komlós, and Szemerédi on O(1) worse-case hashing).

109. Hashing: basics

  • Want to store s elements from a universe M of size m in a table with indices drawn from an index space N of size n.
  • Typically we assume M = [m] = {0...m-1} and N = [n] = {0..n-1}.
  • If m ≤ n, just use an array.
  • Otherwise use a hash function h:M→N.

  • When m > n, we always get collisions = pairs (x,y) with h(x) = h(y).

  • Collisions are handled using some secondary data structure in each bin.
    • Linked lists are typical (open addressing).

    • FKS (see below) uses hash tables.
  • Choosing a random hash function from some good family of hash functions can minimize collisions, because the adversary can't tell which elements to include in the table.
  • Goal: Do each search in O(1+s/n) expected time, which for s ≤ n will be much better than the Θ(log n) time for balanced trees or SkipLists.

This only works if we are working in a RAM (random-access machine model), where we can access arbitrary memory locations in time O(1) and similarly compute arithmetic operations on O(log M)-bit values in time O(1). There is an argument that in reality any actual RAM machine requires either Ω(log N) time to read one of N memory locations (routing costs) or, if one is particularly pedantic, Ω(N1/3) time (speed of light + finite volume for each location). We will ignore this argument.

110. Universal hash families

  • Family of hash functions H is 2-universal if for any x≠y, Pr[h(x)=h(y)] ≤ 1/n for random h∈H.

    • Strongly 2-universal if for any x1≠x2∈M, y1,y2∈N, Pr[h(x1)=y1 ∧ h(x2) = y2] ≤ 1/n2 for random h∈H.

    • k-universal usually means strongly k-universal: Given distinct x1...xk, and any y1...yk Pr[h(xi)=yi) ∀i] ≤ 1/nk. It is possible to generalize the weak version of 2-universality to get a weak version of k-universality (Pr[h(xi) are all equal] ≤ 1/nk-1), but this generalization is not as useful as strong k-universality.

  • Let δ(x,y,h) = 1 if x≠y and h(x)=h(y), 0 otherwise.
    • Abuse of notation: δ(X,Y,H) = ∑x∈X,y∈Y,h∈H δ(x,y,h), with e.g. δ(x,Y,h) = δ({x},Y,{h}).

  • H is 2-universal = δ(x,y,H) ≤ |H|/n.
    • If H is all functions M→N, we get equality; but we might do better since h might tend to map distinct values to distinct places.
    • Lower bound: For any family H, ∃x,y s.t. δ(x,y,H) ≥ |H|/n - |H|/m
      • Proof: Count collisions in inverse images of each elements z.
        • Have δ(h-1(z), h-1(z), h) = |h-1(z)|⋅(|h-1(z)|-1).

        • Sum over all z to get δ(M,M,h) ≥ ∑z |h-1(z)|⋅(|h-1(z)|-1).

        • Use convexity or Lagrange multipliers to argue RHS is minimized subject to ∑z |h-1(z)| = m when all pre-images are the same size m/n.

        • This gives δ(M,M,h) ≥ n(m/n)(m/n-1) = m2(1/n - 1/m).

        • Sum over all h to get δ(M,M,H) ≥ |H| m2(1/n - 1/m).

        • PigeonholePrinciple says some pair x,y has δ(x,y,H) ≥ δ(M,M,H)/(m(m-1)) > |H| m2(1/n - 1/m). (Note a bit of slop there.)

        • Since 1/m is usually << 1/n, we are happy if we get the 2-universal upper bound of |H|/n.

    • Claim: hab(x) = (ax + b mod p) mod n, where a∈ℤp-{0}, b∈ℤp and p is a prime ≥ m, is 2-universal.

      • Proof: Again, count collisions.
        • Split hab(x) as g(fab(x)) where fab(x) = ax+b mod p and g(x) = x mod n.

        • Claim: If x≠y, then δ(x,y,H) = δ(ℤp,ℤp,g).

          • Proof:
            • Let r≠s ∈ Zp.

            • Then ax+b = r and ay+b = s has a unique solution mod p (because ℤp is a FiniteField).

            • ⇒ ∃ fab is a 1-1 map between a,b and r,s

            • ⇒ δ(x,y,H) = ∑r≠s δ(r,s,g) = δ(ℤp,ℤp,g).

        • For fixed r, ⌈p/n⌉ is upper bound on number of elements in g-1(g(r)), subtract one to get number of s that collide with r.

        • So total collisions δ(ℤp,ℤp,g) ≤ p(⌈p/n⌉-1) ≤ p(p-1)/n = |H|/n.

  • 2-universal hash family ⇒ open addressing costs O(1+s/n) expected time per operation.
    • Proof: E[cost of operation on key x] = O(1 + E[size of linked list at h(x)]) = O(1 + E[# of y∈S that collide with x]) = O(1 + s Pr[y≠x collides with x]) = O(1+s/n).

111. FKS hashing with O(s) space and O(1) worst-case search time

Goal is to hash a static set S so that we never pay more than constant time for search (not just in expectation).

  • A perfect hash function for a set S is a hash function h:M→N that is injective on S (x≠y ⇒ h(x)≠h(y) for x,y∈S).

  • Claim: If H is 2-universal and s2 ≤ n, then there is a perfect h∈H for S.

    • Proof: Usual collision-counting argument.
      • For all x≠y, have δ(x,y,H) ≤ |H|/n.
      • So δ(S,S,H) ≤ s(s-1) |H|/n.
      • PigeonholePrinciple says ∃h∈H such that δ(S,S,h) ≤ s(s-1)/n < 1.

      • δ(S,S,h) < 1 ⇒ δ(S,S,h) = 0 (since δ(S,S,h) is always an integer).

    • Practical variant: If s2 ≤ αn, then Pr[h is perfect for S] > 1/α.

  • This immediately gives O(1) search time with O(s2) space.

  • Next step: Hash to n = s bins, then rehash perfectly within each bin.
    • Time cost is O(1) for outer hash + O(1) for inner (perfect) hash, giving O(1) total.
    • Space cost is ∑i O(si2), where si = | { s∈S | h(s) = i } | = | h-1(i) |.

      • Observe that ∑i si2 = ∑i (si + si(si-1)) = s + δ(S,S,h).

      • Since H is 2-universal, have δ(S,S,H) ≤ |H| s(s-1)/n.
      • PigeonholePrinciple says ∃h such that δ(S,S,h) ≤ (1/|H|) δ(S,S,h) ≤ s(s-1)/n = s-1 < s.

      • So space cost is really O(s) + O(s) = O(s).
    • Same practical consideration as above applies, by making s = 2n (say) get a constant probability of choosing a good h, keep trying until we get one.

112. Cuckoo hashing

Goal: Get O(1) search time in a dynamic hash table at the cost of a messy insertion procedure. In fact, each search takes only two reads, which can be done in parallel; this is optimal by a lower bound of Pagh probe.pdf, which shows a matching upper bound for static dictionaries. Cuckoo hashing is an improved version of this result that allows for dynamic insertions.

We'll mostly be following the presentation in the original cuckoo hashing paper by Pagh and Rodler: cuckoo-jour.pdf.

112.1. Structure

We have two tables T1 and T2 of size r each, with separate, independent hash functions h1 and h2. These functions are assumed to be k-universal for some sufficiently large value k; as long as we never look at more than k values at once, this means we can treat them effectively as random functions. (In practice, using crummy hash functions seems to work just fine, a common property of hash tables.)

Every key x is stored either in T1[h1(x)] or T2[h2(x)]. So the search procedure just looks at both of these locations and returns whichever one contains x (or fails if neither contains x).

To insert a value x1, we put it in T1[h1(x1)] or T2[h2(x1)]. If one or both of these locations is empty, we put it there. Otherwise we have to kick out some value that is in the way (this is the cuckoo part of cuckoo hashing, named after the bird that leaves its eggs in other birds' nests). So we let x2 = T1[h1(x1)], and insert x1 in T1[h1(x1)]. We now have a new "nestless" value x2, which we swap with whatever is in T2[h2(x2)]. If that location was empty, we are done; otherwise, we get a new value x3 that we have to put in T1 and so on. The procedure terminates when we find an empty spot or if enough iterations have passed that we don't expect to find an empty spot, in which case we rehash the entire table. The code from the Pagh-Rolder paper looks like this:

  • procedure insert(x)
    • if lookup(x) then return
    • else do Maxloop time
      • x ↔ T1[h1(x)]

      • if x = ⊥ then return
      • x ↔ T2[h2(x)]

      • if x = ⊥ then return
    • end loop
    • rehash()
    • insert(x)
  • end

A detail not included in the above code is that we always rehash (in theory) after r2 insertions; this avoids potential problems with the hash functions used in the paper not being universal enough.

The main question is how long it takes the insertion procedure to terminate, assuming the table is not too full. Letting s = |S|, we will assume r/s ≥ 1+ε for some fixed ε.

First let's look at what happens during an insert if we have a lot of nestless values. We have a sequence of values x1, x2, where each pair of values xi, xi+1 collides in h1 or h2. Assuming we don't reach the MaxLoop limit, there are three main possibilities (the leaves of the tree below):

  1. Eventually we reach an empty position without seeing the same key twice.
  2. Eventually we see the same key twice; there is some i and j>i such that xj=xi. Since xi was already moved once, when we reach it the second time we will try to move it back, displacing xi-1. This process continues until we have restored x2 to T1[h1(x1)], displacing x2 to T2[h2(x1)] and possibly creating a new sequence of nestless values. Two outcomes are now possible:

    1. Some xl is moved to an empty location. We win!

    2. Some xl is moved to a location we've already looked at. We lose! In this case we are playing musical chairs with more players than chairs, and have to rehash.

Let's look at the probability that we get the last, closed loop case. Following Pagh-Rolder, we let v be the number of distinct nestless keys in the loop. We can now count how many different ways such a loop can form: There are v3 choices for i, j, and l, rv-1 choices of cells for the loop, and sv-1 choices for the non-x1 elements of the loop. We also have 2v edges each of which occurs with probability r-1, giving a total probability of v3rv-1sv-1r-2v = v3(s/r)v/(sn). Summing this over all v gives 1/(rs) ∑ v3(s/r)v = O(1/(rs)) = O(1/r2) (the series converges under the assumption that s/r < 1). Since the cost of hitting a closed loop is O(r), this adds O(1) to the insertion complexity.

Now we look at what happens if we don't get a closed loop. It's a little messy to analyze the behavior of keys that appear more than once in the sequence, so the trick used in the paper is to observe that for any sequences of nestless keys x1...xp, there is a subsequence of size p/3 with no repetitions that starts with x1. Since there are only two subsequences that starts with x1 (we can't have the same key show up more than twice), this will either be x1...xj-1 or x1=xi+j-1...xp, and a case analysis shows that at least one of these will be big. We can then argue that the probability that we get a sequence of v distinct keys starting with x1 in either T1 or T2 is at most 2(s/r)v-1 (since we have to hit a nonempty spot, with probability ≤ s/r, at each step, but there are two possible starting locations), which gives an expected insertion time bounded by ∑ 3v (s/r)v-1 = O(1).

Using a slightly more detailed analysis (see the paper), it can be shown that the bound for non-constant ε is O(1+1/ε).

113. Bloom filters

See MitzenmacherUpfal §5.5.3 for basics and a principled analysis or Bloom filter for many variations and the collective wisdom of the unwashed masses.


CategoryRandomizedAlgorithmsNotes

114. C/FunctionPointers

115. Basics

A function pointer, internally, is just the numerical address for the code for a function. When a function name is used by itself without parentheses, the value is a pointer to the function, just as the name of an array by itself is a pointer to its zeroth element. Function pointers can be stored in variables, structs, unions, and arrays and passed to and from functions just like any other pointer type. They can also be called: a variable of type function pointer can be used in place of a function name.

116. Function pointer declarations

A function pointer declaration looks like a function declaration, except that the function name is wrapped in parentheses and preceded by an asterisk. For example:

   1 /* a function taking two int arguments and returning an int */
   2 int function(int x, int y);
   3 
   4 /* a pointer to such a function */
   5 int (*pointer)(int x, int y);

As with function declarations, the names of the arguments can be omitted.

Here's a short program that uses function pointers:

   1 /* Functional "hello world" program */
   2 
   3 #include <stdio.h>
   4 
   5 int
   6 main(int argc, char **argv)
   7 {
   8     /* function for emitting text */
   9     int (*say)(const char *);
  10 
  11     say = puts;
  12 
  13     say("hello world");
  14 
  15     return 0;
  16 }

117. Applications

Function pointers are not used as much in C as in functional languages, but there are many common uses even in C code.

117.1. Callbacks

The classic example is qsort, from the standard library:

   1 /* defined in stdlib.h */
   2 void qsort(void *base, size_t n, size_t size,
   3     int (*cmp)(const void *key1, const void *key2));

This is a generic sorting routine that will sort any array in place. It needs to know (a) the base address of the array; (b) how many elements there are; (c) how big each element is; and (d) how to compare two elements. The only tricky part is supplying the comparison, which could involve arbitrarily-complex code. So we supply this code as a function with an interface similar to strcmp.

   1 static int
   2 compare_ints(void *key1, void *key2)
   3 {
   4     return *((int *) key1) - *((int *) key2);
   5 }
   6 
   7 int
   8 sort_int_array(int *a, int n)
   9 {
  10     qsort(a, n, sizeof(*a), compare_ints);
  11 }

Other examples might include things like registering an error handler for a library, instead of just having it call abort() or something equally catastrophic, or providing a cleanup function for freeing data passed into a data structure.

117.2. Dispatch tables

Alternative to gigantic if/else if or switch statements. See page 234 of KernighanPike for a good example of this.

117.3. Iterators

See C/Iterators.

118. Closures

A closure is a function plus some associated state. A simple way to implement closures in C is to use a static local variable, but then you only get one. Better is to allocate the state somewhere and pass it around with the function. For example, here's a simple functional implementation of infinite sequences, that generalizes the example in AbstractDataTypes:

   1 /* a sequence is an object that returns a new value each time it is called */
   2 struct sequence {
   3     int (*next)(void *data);
   4     void *data;
   5 };
   6 
   7 typedef struct sequence *Sequence;
   8 
   9 Sequence
  10 create_sequence(int (*next)(void *data), void *data)
  11 {
  12     Sequence s;
  13 
  14     s = malloc(sizeof(*s));
  15     assert(s);
  16 
  17     s->next = next;
  18     s->data = data;
  19 
  20     return s;
  21 }
  22 
  23 int
  24 sequence_next(Sequence s)
  25 {
  26     return s->next(s->data);
  27 }

And here are some examples of sequences:

   1 /* make a constant sequence that always returns x */
   2 static int
   3 constant_sequence_next(void *data)
   4 {
   5     return *((int *) data);
   6 }
   7 
   8 Sequence
   9 constant_sequence(int x)
  10 {
  11     int *data;
  12 
  13     data = malloc(sizeof(*data));
  14     if(data == 0) return 0;
  15 
  16     *data = x;
  17 
  18     return create_sequence(constant_sequence_next, data);
  19 }
  20 
  21 /* make a sequence x, x+a, x+2*a, x+3*a, ... */
  22 struct arithmetic_sequence_data {
  23     int cur;
  24     int step;
  25 };
  26 
  27 static int
  28 arithmetic_sequence_next(void *data)
  29 {
  30     struct arithmetic_sequence_data *d;
  31 
  32     d = data;
  33     d->cur += d->step;
  34 
  35     return d->cur;
  36 }
  37 
  38 Sequence
  39 arithmetic_sequence(int x, int a)
  40 {
  41     struct arithmetic_sequence_data *d;
  42 
  43     d = malloc(sizeof(*d));
  44     if(d == 0) return 0;
  45 
  46     d->cur = x - a;             /* back up so first value returned is x */
  47     d->step = a;
  48 
  49     return create_sequence(arithmetic_sequence_next, d);
  50 }
  51 
  52 /* Return the sum of two sequences */
  53 static int
  54 add_sequences_next(void *data)
  55 {
  56     Sequence *s;
  57 
  58     s = data;
  59     return sequence_next(s[0]) + sequence_next(s[1]);
  60 }
  61 
  62 Sequence
  63 add_sequences(Sequence s0, Sequence s1)
  64 {
  65     Sequence *s;
  66 
  67     s = malloc(2*sizeof(*s));
  68     if(s == 0) return 0;
  69 
  70     s[0] = s0;
  71     s[1] = s1;
  72 
  73     return create_sequence(add_sequences_next, s);
  74 }
  75 
  76 /* Return the sequence x, f(x), f(f(x)), ... */
  77 struct iterated_function_sequence_data {
  78     int x;
  79     int (*f)(int);
  80 }
  81 
  82 static int
  83 iterated_function_sequence_next(void *data)
  84 {
  85     struct iterated_function_sequence_data *d;
  86     int retval;
  87 
  88     d = data;
  89 
  90     retval = d->x;
  91     d->x = d->f(d->x);
  92 
  93     return retval;
  94 }
  95 
  96 Sequence
  97 iterated_function_sequence(int (*f)(int), int x0)
  98 {
  99     struct iterated_function_sequence_data *d;
 100 
 101     d = malloc(sizeof(*d));
 102     if(d == 0) return 0;
 103 
 104     d->x = x0;
 105     d->f = f;
 106 
 107     return create_sequence(iterated_function_sequence_next, d);
 108 }

Note that we haven't worried about how to free the data field inside a Sequence, and indeed it's not obvious that we can write a generic data-freeing routine since we don't know what structure it has. The solution is to add more function pointers to a Sequence, so that we can get the next value, get the sequence to destroy itself, etc. When we do so, we have gone beyond building a closure to building an object.

119. Objects

Here's an example of a hierarchy of counter objects. Each counter object has (at least) three operations: reset, next, and destroy. To call the next operation on counter c we include c and the first argument, e.g. c->next(c) (one could write a wrapper to enforce this).

The main trick is that we define a basic counter structure and then extend it to include additional data, using lots of pointer conversions to make everything work.

   1 /* use preprocessor to avoid rewriting these */
   2 #define COUNTER_FIELDS  \
   3     void (*reset)(struct counter *);    \
   4     int (*next)(struct counter *);      \
   5     void (*destroy)(struct counter *);
   6 
   7 struct counter {
   8     COUNTER_FIELDS
   9 };
  10 
  11 typedef struct counter *Counter;
  12 
  13 /* minimal counter--always returns zero */
  14 /* we don't even allocate this, just have one global one */
  15 static void noop(Counter c) { ; }
  16 static int return_zero(Counter c) { return 0; }
  17 static struct counter Zero_counter = { noop, return_zero, noop };
  18 
  19 Counter
  20 make_zero_counter(void)
  21 {
  22     return &Zero_counter;
  23 }
  24 
  25 /* a fancier counter that iterates a function sequence */
  26 /* this struct is not exported anywhere */
  27 struct ifs_counter {
  28 
  29     /* copied from struct counter declaration */
  30     COUNTER_FIELDS
  31 
  32     /* new fields */
  33     int init;
  34     int cur;
  35     int (*f)(int);      /* update rule */
  36 };
  37 
  38 static void
  39 ifs_reset(Counter c)
  40 {
  41     struct ifs_counter *ic;
  42 
  43     ic = (struct ifs_counter *) c;
  44 
  45     ic->cur = ic->init;
  46 }
  47 
  48 static void
  49 ifs_next(Counter c)
  50 {
  51     struct ifs_counter *ic;
  52     int ret;
  53 
  54     ic = (struct ifs_counter *) c;
  55 
  56     ret = ic->cur;
  57     ic->cur = ic->f(ic->cur);
  58 
  59     return ret;
  60 }
  61 
  62 Counter
  63 make_ifs_counter(int init, int (*f)(int))
  64 {
  65     struct ifs_counter *ic;
  66 
  67     ic = malloc(sizeof(*ic));
  68 
  69     ic->reset = ifs_reset;
  70     ic->next = ifs_next;
  71     ic->destroy = (void (*)(struct counter *)) free;
  72 
  73     ic->init = init;
  74     ic->cur = init;
  75     ic->f = f;
  76 
  77     /* it's always a Counter on the outside */
  78     return (Counter) ic;
  79 }

A typical use might be

   1 static int
   2 times2(int x)
   3 {
   4     return x*2;
   5 }
   6 
   7 void
   8 print_powers_of_2(void)
   9 {
  10     int i;
  11     Counter c;
  12 
  13     c = make_ifs_counter(1, times2);
  14 
  15     for(i = 0; i < 10; i++) {
  16         printf("%d\n", c->next(c));
  17     }
  18 
  19     c->reset(c);
  20 
  21     for(i = 0; i < 20; i++) {
  22         printf("%d\n", c->next(c));
  23     }
  24 
  25     c->destroy(c);
  26 }


CategoryProgrammingNotes

120. C/PerformanceTuning

KernighanPike Chapter 7 gives an excellent overview of performance tuning. This page will be limited to some Linux-specific details and an example.

121. Timing under Linux

Use time, e.g.

$ time wc /usr/share/dict/words
 45378  45378 408865 /usr/share/dict/words

real    0m0.010s
user    0m0.006s
sys     0m0.004s

This measures "real time" (what it sounds like), "user time" (the amount of time the program runs), and "system time" (the amount of time the operating system spends supporting your program, e.g. by loading it from disk and doing I/O). Real time need to be equal to the sum of user time and system time, since the operating system may be simultaneously running other programs.

Particularly for fast programs, times can vary from one execution to the next, e.g.

$ time wc /usr/share/dict/words
 45378  45378 408865 /usr/share/dict/words

real    0m0.009s
user    0m0.008s
sys     0m0.001s
$ time wc /usr/share/dict/words
 45378  45378 408865 /usr/share/dict/words

real    0m0.009s
user    0m0.007s
sys     0m0.002s

This arises because of measurement errors and variation in how long different operations take. But usually the variation will not be much.

Note also that time is often a builtin operation of your shell, so the output format may vary depending on what shell you use.

122. Profiling with gprof

The problem with time is that it only tells you how much time your whole program took, but not where it spent its time. This is similar to looking at a program without a debugger: you can't see what's happening inside. If you want to see where your program is spending its time, you need to use a profiler.

For example, here's a short but slow program for calculating the number of primes less than some limit passed as argv[1]:

   1 #include <stdio.h>
   2 #include <stdlib.h>
   3 
   4 /* return 1 if n is prime, 0 otherwise */
   5 int
   6 isprime(int n)
   7 {
   8     int factor;
   9 
  10     if(n < 2) return 0;
  11     /* else */
  12     for(factor = 2; factor < n; factor++) {
  13         if(n % factor == 0) return 0;
  14     }
  15     /* else */
  16     return 1;
  17 }
  18 
  19 /* return number of primes < n */
  20 int
  21 countprimes(int n)
  22 {
  23     int i;
  24     int count;
  25 
  26     count = 0;
  27 
  28     for(i = 0; i < n; i++) {
  29         if(isprime(i)) count++;
  30     }
  31 
  32     return count;
  33 }
  34 
  35 int
  36 main(int argc, char **argv)
  37 {
  38     printf("%d\n", countprimes(atoi(argv[1])));
  39 
  40     return 0;
  41 }
countprimes.c

And now we'll time countprimes 40000:

$ time ./countprimes 40000
4203
./countprimes 40000  3.99s user 0.01s system 88% cpu 4.544 total

This is not too bad, but the reason I picked 40000 was that it didn't terminate fast enough for larger inputs. We'd like to see why it is taking so long, to have some idea what to try to speed up. So we'll compile it with the -pg option to gcc, which inserts profiling code that counts how many times each function is called and how long (on average) each call takes.

$ gcc -o countprimes -pg countprimes.c
$ time ./countprimes 40000
4203
./countprimes 40000  4.01s user 0.01s system 94% cpu 4.260 total

Hooray! We've made the program slightly slower (ignore the total time at the end---that's wall-clock time, and it only dropped because we got more of the CPU the second time we ran it). But we also just produced a file gmon.out that we can read with gprof:

$ gprof countprimes
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 99.95     40.18    40.18    40000     0.00     0.00  isprime
  0.05     40.20     0.02        1     0.02    40.20  countprimes

[much explanatory text deleted]

It looks like we are spending all of our time in isprime. Can we speed isprime up?

What isprime is doing is testing if a number n is prime by the most direct way possible: dividing by all numbers less than n until it finds a factor. That's a lot of divisions: if n is indeed prime, it's linear in n. Since division is a relatively expensive operation, the first thing to try is to get rid of some.

Here's a revised version of isprime:

   1 /* return 1 if n is prime, 0 otherwise */
   2 int
   3 isprime(int n)
   4 {
   5     int factor;
   6 
   7     if(n < 2) return 0;
   8     if(n % 2 == 0) return 0;
   9     /* else */
  10     for(factor = 3; factor < n; factor+=2) {
  11         if(n % factor == 0) return 0;
  12     }
  13     /* else */
  14     return 1;
  15 }

The trick is to check first if n is divisible by 2, and only test odd potential factors thereafter. Let's see how well this works:

$ gcc -o countprimes -pg countprimes.c
$ time ./countprimes 40000
4202
./countprimes 40000  2.01s user 0.01s system 96% cpu 2.095 total
$ gprof countprimes
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
100.00     20.09    20.09    40000     0.00     0.00  isprime
  0.00     20.09     0.00        1     0.00    20.09  countprimes

[...]

Twice as fast! And the answer is still the same, too---this is important.

Can we test even fewer factors? Suppose n has a non-trivial factor x. Then n equals x*y for some y which is also nontrivial. One of x or y will be no bigger than the square root of n. So perhaps we can stop when we reach the square root of n, which we can compute using the math library routine sqrt. Let's try it:

   1 #include <math.h>
   2 
   3 /* return 1 if n is prime, 0 otherwise */
   4 int
   5 isprime(int n)
   6 {
   7     int factor;
   8 
   9     if(n < 2) return 0;
  10     if(n % 2 == 0) return 0;
  11     /* else */
  12     for(factor = 3; factor < sqrt(n)+1; factor+=2) {
  13         if(n % factor == 0) return 0;
  14     }
  15     /* else */
  16     return 1;
  17 }

I added +1 to the return value of sqrt because the output of sqrt is not exact, and it would be embarrassing if I announced that 25 was prime because I stopped at 4.9999999997.

Using the math library not only requires including <math.h> but also requires compiling with the -lm flag after all .c or .o files, to link in the library routines:

$ gcc -o countprimes -pg countprimes.c -lm
$ time ./countprimes 40000
4202
./countprimes 40000  0.09s user 0.00s system 98% cpu 0.091 total
$ gprof countprimes| head -20
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  nameG
 96.88      0.31     0.31    40000     0.01     0.01  isprime
  3.12      0.32     0.01        1    10.00   320.00  countprimes

[...]

Whoosh.

Can we optimize further? Let's see what happens on a bigger input:

$ time ./countprimes 1000000
78497
./countprimes 1000000  7.01s user 0.02s system 92% cpu 7.590 total
$ gprof countprimes| head -8
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 99.32     24.96    24.96  1000000     0.00     0.00  isprime
  0.68     25.13     0.17        1     0.17    25.13  countprimes

This is still very good. Can we do better?

Maybe moving the sqrt out of the loop in isprime will make a difference:

   1 /* return 1 if n is prime, 0 otherwise */
   2 int
   3 isprime(int n)
   4 {
   5     int factor;
   6     int root;
   7 
   8     if(n < 2) return 0;
   9     if(n % 2 == 0) return 0;
  10     /* else */
  11     root = sqrt(n)+1;
  12     for(factor = 3; factor < root; factor+=2) {
  13         if(n % factor == 0) return 0;
  14     }
  15     /* else */
  16     return 1;
  17 }

$ gcc -o countprimes -pg countprimes.c -lm
$ time ./countprimes 1000000
78497
./countprimes 1000000  7.01s user 0.01s system 94% cpu 7.419 total

Nope. It looks like the compiler was smart enough to do this already.

What if we get rid of the call to sqrt and test if factor * factor <= n instead? This way we could dump the math library:

   1 /* return 1 if n is prime, 0 otherwise */
   2 int
   3 isprime(int n)
   4 {
   5     int factor;
   6 
   7     if(n < 2) return 0;
   8     if(n % 2 == 0) return 0;
   9     /* else */