Defensive Programming Techniques

The topic is an overview of defensive programming techniques with a view to making your testing and debugging much easier.

PREAMBLE

The best way to achieve working programs is to write the programs very carefully in the first place.

Try to fully understand the problem that your program is meant to solve before beginning to write code. (However, one's understanding often comes only during the coding process. If that is the situation, you should seriously consider throwing away all your code and starting again. In any case, a common practice is to write a prototype program first -- use it to understand the problem and the solution -- then discard it and create the production version of the program.)
Never write code in a hurry. The time you save will be lost many times over when you debug the program, so think about every line of code.
If you have a willing partner, read and explain your program to him or her. This person does not necessarily have to be someone who can program. Just trying to explain what you have done will very often show up your misconceptions about the problem or about how you thought the program should work.

DEFENSIVE PROGRAMMING

Only a naive programmer (or a truly brilliant one) believes that his or her program will work first time and be totally correct. You will be able to locate those errors much more rapidly if you adopt some simple programming strategies. Although these strategies may slow you down a bit, that is probably a good thing in any case!

(Programming examples follow in later sections.)

Do not use programming tricks (at least not in the first version of the program). Tricks are, almost by definition, hard to understand -- and you are likely to get them wrong. Even if you don't get it wrong, the next programmer who comes along and has to maintain the code may break your working code through not understanding how it works.
Split complicated expressions and program logic into a series of simpler calculations. You will find the resulting code easier to understand and to debug. Any resulting loss of program efficiency is nothing compared to the time you take to debug your program. (This advice and the previous one correspond to the KISS principle -- "keep it simple, stupid!".)
Initialize all local variables at the point of declaration (or in the first few statements of the function), unless it is obvious from visual inspection that the initialization is redundant. Initialization is especially important for pointer variables. I.e., declare every local pointer variable like this:
```
           ITEM *p = NULL; 
```
Note that most C compilers automatically initialize the storage of global variables to contain zero. It is still, however, good practice to provide initial values for global variables in case your program is ever compiled on a system where global initialization is not performed.

Initialize all fields of a dynamically allocated structure immediately after the call to malloc. E.g., write your code like this:

		np = malloc(sizeof(LISTELEMENT));
		... check that np is not NULL (see below)
		/* initialize all fields to neutral values */
		np->name = NULL;
		np->link = NULL;
		np->age = 0;

The calloc function allocates memory and initializes it to hold zero values. Therefore you might consider always using calloc instead of malloc. E.g., the above code could be rewritten as:
```
          np = calloc(1, sizeof(LISTELEMENT));
          ... check that np is not NULL (see below)
```
Insert code at the beginning of every function to test the validity of the input arguments. The recommended approach is to use the assert macro in C, as explained below.
Always check the return code from a function that provides an indication of success or failure. (Unless, perhaps, the function's failure would have no further impact on your program.) The UNIX man pages for the C library functions provide complete details of result error codes.
Design your own functions so that when some error condition is encountered, they either report the error to the caller with a return status code or they print an error message and halt. They should never ignore the error and quietly return to the caller.
Insert code to check the range of array indexes (unless it is obvious from visual inspection of the code that the index must be valid). One approach is to use the assert macro (see below). Another approach is to define your own macros for accessing arrays. An example is given below.
Insert code to check that a pointer is not NULL before you dereference that pointer (i.e. before you try to access storage that it is supposed to refer to). See the programming example below.
In the initial version of the program, do not de-allocate memory (with the free function) that has been previously allocated by malloc (or its cousins, calloc & realloc). Leave the calls to free commented out until your program has passed at least its first set of tests. (Accidental re-use of freed memory leads to mysterious errors; you should avoid having to track these errors down until other problems have been eliminated.) When your program does free storage, reset any pointers to that storage to NULL before continuing. I.e., after
```
          free(np); 
```
add the statement:
```
          np = NULL; 
```
unless variable np is re-used almost immediately.
In any loop that is not controlled by a counter with explicit limits, insert a counter and a check that the counter does not exceed your wildest, most dire, prediction for the number of iterations required. See below for an example.
Always compile with the -g flag is in effect and do not supply the -O (optimize) flag. The -g flag means that dbx is available when your program goes wrong. The -O flag often causes the C compiler to re-order the statements in your program and this can make debugging very difficult. (Most C compilers will, in any case, ignore the -O flag if you have supplied the -g flag.)

CHECKING RETURN CODES FROM LIBRARY FUNCTIONS

The memory allocation functions (malloc, calloc, realloc) are particularly important for return code checking. Always write your code like this:

		p = calloc(numElements, ELEMENTSIZE);
		if (p == NULL) {
		    fprintf(stderr, "malloc failed!\n");  abort();
		}

The functions for accessing files are also very important (because they can fail for reasons quite outside the program's control). Therefore, use code like this to open a file:

		fp = fopen("mynewfile", "w");
		if (fp == NULL) {
		    perror("mynewfile");  abort();
		}

and it is a good idea to also check that writes to files succeed ...

		if (fprintf(fp, "%d", n) < 0) {
		    perror("mynewfile");  abort();
		}

It is conceivable that disk storage is becomes exhausted and causes the write to file, but other error reasons are also possible. [The perror() function, used above, is explained next.]

USING THE PERROR MACRO

Many of the file handling functions in the C library store an error reason code in an integer variable named errno when they discover an error situation. Therefore, if a function returns an error status code, you can usually check the value of errno to find out more about what went wrong.

The following is plausible coding:

	extern int errno;
	...
	fp = fopen("mynewfile", "w");
	if (fp == NULL) {
	    fprintf(stderr, "mynewfile: error code = %d\n", errno);
	    abort();
	}

However, textual error messages are much more useful than numeric error codes. The perror() function simply looks up the error reason code in a table and outputs the error description on the standard error stream, prefixed with the text provided by you. This text is typically the name of the file that we are having trouble using. In the code above, we should replace the fprintf call with

	perror("mynewfile");

Note that if you want your program to continue execution after reporting the error (rather than calling abort), you may have to clear an error indication associated with a file before you carry on. E.g.

	clearerr(fp);

must be performed before your program attempts some other action on the file referenced by fp.

USING THE ASSERT MACRO

The <assert.h> header file defines a macro named assert. It can be used as in the example below.

Example of use:

          #include <assert.h>
          ...
          void foo( int N, float *floatArray ) {
              assert( N > 0 && floatArray != NULL);
              ...
          }

When this code is compiled, that assert macro is expanded into C code like this:

          if (!(N > 0 && floatArray != NULL)) {
              fprintf(stderr, -- an error message -- );
              abort();
          }

The error message includes the file name and line number where the assertion failed.
If, after comprehensive testing of the program, you decide that the assert macro invocations are unnecessary, you can disable them by setting the macro flag NDEBUG. You can do this by inserting the line:
```
          #define NDEBUG 
```
in files that use the assert macro. Alternatively, you can set the NDEBUG flag by supplying the command line parameter -DNDEBUG in the compile command. For example:
```
          cc -c -DNDEBUG foo.c
```
It is shortsighted practice to remove the assert macros from the source code with an editor. Leave them in -- they serve as documentation for some of your program assumptions and they may turn out to be useful if the program needs to be modified later.

CHECKING ARRAY INDEXES IN C

No standard C compiler generates code to check that array subscripts re within range (though some C development environments, such as CodeCenter, do provide that capability). If you want such checking in a C program, you must do it yourself. [This is a good reason to consider switching to the C++ language, because redefining the array index operator to perform subscript range checking is easy.]

The easiest approach in C is to define macros for accessing the arrays.

For example, suppose that our program declares this array:

	arr[100];

and there are subsequent uses of the array, such as

	arr[i] = arr[j]+1;

We could add the following function to our program:

	void checkIndex( int index, int size ) {
	    if (index < 0 || index >= size) {
		fprintf(stderr, "index out of range\n");
		abort();
	    }
	}

and expand the array declaration by adding a macro definition alongside:

	int arr[100];
	#define ARR(x)	arr[checkIndex((x),100),(x)]

Now, our uses of the array can be written like this:

	ARR(i) = ARR(j) + 1;

You would probably like some way of disabling the array checking after the program has been debugged and tested. We can make disabling easy by writing the macro definition like this:

	#ifdef ARRAY_DEBUG
	#define ARR(x)	arr[checkIndex((x),100),(x)]
	#else
	#define ARR(x)	arr[x]
	#endif

CHECKING POINTERS IN C

Pointer variables with undefined values, or pointers that have become invalid because they reference memory that has been de-allocated, can lead to very mysterious errors. Almost any imaginable error symptom can appear if your program writes to memory via a bad pointer value.

Again, it is easy in C++ to extend the meaning of the prefix *(dereference) operator so that the pointer is checked before it is used. In C, we should define a macro. Here is a simple example.

If the program contains a declaration like this:

          LISTELEMENT	*np;

and a use like this:

          np->name = "Mr Smith";

we can add the following function and macro definition to our program:

	void checkPointer(void *p) {
	    if (p == NULL) {
		fprintf(stderr,
		    "Attempt to dereference NULL pointer\n");
		abort();
	    }
	}
 
	#define PCHECK(x)	(checkPointer(x),(x))

and our example use of the pointer should be coded like this:

		PCHECK(np)->name = "Mr Smith";

Note that all pointer variables should be initialized to NULL when they are declared, and pointers should be reset to NULL after a call to free(), for this checking strategy to be most effective.

COUNTING LOOP ITERATIONS

Infinite loops can be aggravating to debug. Remember that you can interrupt a non-terminating program with the Quit signal -- generated by typing ^\ (control-backslash) -- and that will force a core dump which can be analyzed with dbx. But you will find the program much easier to debug if it contains no indefinite loops.

If you have a loop like:

	while( some_condition ) {
	    ...
	}

you can convert it into the following form:

	#define MAX_ITERATIONS	100000
	int limit;
 
	...
	limit = 0;
	while( some_condition ) {
	    if (++limit >= MAX_ITERATIONS) {
		fprintf(stderr, "Infinite loop at line %d\n",
		    __LINE__);  abort();
	    }
	    ...
	}

[See below for an explanation of __LINE__.]

MISCELLANEOUS

ANSI C compilers provide two built-in macros, __LINE__ and __FILE__. (These names begin and end with two underscores.) The __LINE__ macro expands to the current line number in the current file; the __FILE__ macro expands to the name of the file being processed. You can use these macros to make your error messages more informative.

(The assert macro, as defined in <assert.h>, will print the file name and line number along with the text of the assertion test that has failed.)