1. Integer types

In order to declare a variable, you have to specify a type, which controls both how much space the variable takes up and how the bits stored within it are interpreted in arithmetic operations.

The standard C integer types are:

Name

Typical size

Signed?

char

8 bits

Unspecified

short

16 bits

Yes

int

32 bits

Yes

long

32 bits

Yes

The typical size is for 32-bit architectures like the Intel i386. Some 64-bit machines will have 64-bit ints and longs, and some prehistoric computers had 16-bit ints. Particularly bizarre architectures might have even wilder bit sizes, but you are not likely to see this unless you program vintage 1970s supercomputers. Some compilers also support a long long type that is usually twice the length of a long (e.g. 64 bits on i386 machines); this may or may not be available if you insist on following the ANSI specification strictly. The general convention is that int is the most convenient size for whatever computer you are using and should be used by default.

Whether a variable is signed or not controls how its values are interpreted. In signed integers, the first bit is the sign bit and the rest are the value in 2's complement notation; so for example a signed char with bit pattern 11111111 would be interpreted as the numerical value -1 while an unsigned char with the same bit pattern would be 255. Most integer types are signed unless otherwise specified; an n-bit integer type has a range from -2n-1 to 2n-1-1 (e.g. -32768 to 32767 for a short.) Unsigned variables, which can be declared by putting the keyword unsigned before the type, have a range from 0 to 2n-1 (e.g. 0 to 65535 for an unsigned short).

For chars, whether the character is signed (-128..127) or unsigned (0..255) is at the whim of the compiler. If it matters, declare your variables as signed char or unsigned char. For storing actual characters that you aren't doing arithmetic on, it shouldn't matter.

1.1. Integer constants

Constant integer values in C can be written in any of four different ways:

Except for character constants, you can insist that an integer constant is unsigned or long by putting a u or l after it. So 1ul is an `unsigned long version of 1.  By default integer constants are (signed) int`s.

1.2. Integer operations

The usual + (addition), - (negation or subtraction), and * (multiplication) operators work on integers pretty much the way you'd expect. The only caveat is that if the result lies outside of the range of whatever variable you are storing it in, it will be truncated instead of causing an error:

   1     unsigned char c;
   2 
   3     c = -1;             /* sets c = 255 */
   4     c = 255 + 255;      /* sets c = 254 */
   5     c = 256 * 1772717;  /* sets c = 0 */

This can be a source of subtle bugs if you aren't careful. The usual giveaway is that values you thought should be large positive integers come back as random-looking negative integers.

Division (/) of two integers also truncates: 2/3 is 0, 5/3 is 1, etc. For positive integers it will always round down. If either the numerator or denominator is negative, the behavior is unpredictable and depends on what your processor does---in practice this means you should never use / if one or both arguments might be negative. There is also a remainder operator % with e.g. 2%3 = 2, 5%3 = 2, 27 % 2 = 1, etc. It is also unpredictable when fed negative values, although y == x*(y/x) + y%x should always be true.

In addition to the arithmetic operators, integer types support bitwise logical operators that apply some Boolean operation to all the bits of their arguments in parallel. What this means is that the i-th bit of the output is equal to some operation applied to the i-th bit(s) of the input(s). The bitwise logical operators are ~ (bitwise negation: used with one argument as in ~0 for the all-1's binary value), & (bitwise AND), '|' (bitwise OR), and '^' (bitwise XOR, i.e. sum mod 2). These are mostly used for manipulating individual bits or small groups of bits inside larger words, as in the expression x & 0x0f, which strips off the bottom four bits stored in x. Two additional logical operators are the shift operators << and >>; see KernighanRitchie for more about these.

To add to the confusion, there are also three logical operators that work on the truth-values of integers, where 0 is defined to be false and anything else is defined by be true. These are && (logical AND), ||, (logical OR), and ! (logical NOT). The result of any of these operations is always 0 or 1 (so !!x, for example, is 0 if x is 0 and 1 if x is anything else). The && and || operators evaluate their arguments left-to-right and ignore the second argument if the first determines the answer (this is the only place in C where argument evaluation order is specified); so

   1     0 && execute_programmer();
   2     1 || execute_programmer();

is in a very weak sense perfectly safe code to run.

Yet another logical operator is the ternary operator ?:, where x ? y : z equals the value of y if x is nonzero and z if x is zero. Like && and ||, it only evaluates the arguments it needs to:

   1     fileExists(badFile) ? deleteFile(badFile) : createFile(badFile);

Most uses of ?: are better done using an if-then-else statement (C/ControlStructures).

Logical operators usually operate on the results of relational operators or comparisons: these are '==' (equality), '!=' (inequality), '<=' (less than or equal to) and '>=' (greater than or equal to). So, for example,

    if(size >= MIN_SIZE && size <= MAX_SIZE) {
        puts("just right");
    }

tests if size is in the (inclusive) range [MIN_SIZE..MAX_SIZE].

Beware of confusing == with =. The code

   1     /* DANGER! DANGER! DANGER! */
   2     if(x = 5) {
   3         ...

is perfectly legal C, and will set x to 5 rather than testing if it's equal to 5. Because 5 happens to be nonzero, the body of the if statement will always be executed.

1.3. Alignment

Modern CPU architectures typically enforce alignment restrictions on multi-byte values, which mean that the address of an int or long typically has to be a multiple of 4. This is an effect of the memory being organized as groups of 32 bits that are written in parallel instead of 8 bits. Such restrictions are not obvious when working with integer-valued variables directly, but will come up when we talk about pointers in C/Pointers.


CategoryProgrammingNotes

  1. Certain ancient versions of C ran on machines with a different character set encoding, like EBCDIC. The C standard does not guarantee ASCII encoding. (1)

C/IntegerTypes (last edited 2007-12-25 23:42:06 by localhost)