Let's start with a very simple case, a null-pointer dereference. This straightforward problem will serve to introduce the debugger and the information available, and to show a "normal" case for comparison with the more complex cases later. Take the following program:
/* 1 */ #include <stdio.h> /* 2 */ #include <stdlib.h> /* 3 */ /* 4 */ void f2(void) /* 5 */ { /* 6 */ int *p=0; /* 7 */ /* 8 */ *p=0xd1e; /* 9 */ } /* 10 */ /* 11 */ void f1(void) /* 12 */ { /* 13 */ f2(); /* 14 */ } /* 15 */ /* 16 */ int main (void) /* 17 */ { /* 18 */ f1(); /* 19 */ printf("done\n"); /* 20 */ /* 21 */ return 0; /* 22 */ }The line numbers are in comments so you can cut and paste it. Suppose it's called foo.c. I compile and run it as so:
$ gcc -g -Wall -o foo foo.c $ ./foo Segmentation faultI'm told that Windows/MSVC will tell you that you dereferenced a null pointer, but all you get on Linux is a segfault. So, let's use the debugger to see what's going on:
$ gdb foo (gdb) run Starting program: /tmp/foo Program received signal SIGSEGV, Segmentation fault. 0x080483f0 in f2 () at foo.c:8 8 *p=0xd1e; (gdb) print p $1 = (int *) 0x0 (gdb) backtrace #0 0x080483f0 in f2 () at foo.c:8 #1 0x08048403 in f1 () at foo.c:13 #2 0x08048413 in main () at foo.c:18When we run the program, we see that it dies with a segmentation fault on line 8. Seeing that this line assigns a value to *p, we can look at the value of p, and see that it is null.
An extremely useful feature of the debugger is to show the stack trace. This shows us, here, that main() called f1() which called f2(), with the line numbers of each call. By using the up and down commands in gdb (or clicking on the window in graphical debuggers like ddd) you can move to each function and inspect variables there. This is particularly useful in C++ programming, where actual segmentation faults tend to occur three levels deep in the STL and you need to move up quite a few calls in order to look at your own code and figure out what the problem really is.
Here's our new program:
/* 1 */ #include <stdio.h> /* 2 */ #include <stdlib.h> /* 3 */ /* 4 */ void f2(void) /* 5 */ { /* 6 */ int a[8]; /* 7 */ int i; /* 8 */ /* 9 */ for (i=0; i<16; i++) /* 10 */ a[i]=0; /* 11 */ } /* 12 */ /* 13 */ void f1(void) /* 14 */ { /* 15 */ f2(); /* 16 */ } /* 17 */ /* 18 */ int main (void) /* 19 */ { /* 20 */ f1(); /* 21 */ printf("done\n"); /* 22 */ /* 23 */ return 0; /* 24 */ }When we run this one, however, the results are somewhat more perplexing:
(gdb) run Starting program: /tmp/foo Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? () (gdb) backtrace #0 0x00000000 in ?? ()There aren't any line numbers! This is because the memory which was overwritten when we overflowed the end of the a[] array is exactly the memory which had contained this information.
On many processor architectures, including i386, the stack grows "backwards" - when things are pushed onto the stack, the stack pointer is decreased. When we call f2() from f1(), the return address is pushed onto the stack and the stack pointer is decreased. At the beginning of f2(), the local variables are allocated space on the stack, and the stack pointer is decreased further. This means that the local variables come just before the return address on the stack, and when we write past the end of the local variables, we clobber the addresses on the stack.
It's important to note that the actual segmentation fault usually will NOT occur when we write off the end of the array; after all, we're just writing to other memory that's part of our process' stack. The segmentation fault occurs when we try to return to the address which now points to memory which does not exist. (It's possible, with care, to overwrite the address on the stack with a controlled address instead of garbage; this technique is used deliberately to exploit security holes.)
For a slight variation, reverse the order of the declarations of i and a[] in the example and try again. You should find that this time, instead of generating a segmentation fault, the program will run forever. This happens because with the variables in the other order, the loop variable i comes directly after the a[] array. Thus, a[9] is i. Every time i gets to 9, it overwrites itself with 0, starting back at the beginning of the array again.
/* 1 */ #include <stdio.h> /* 2 */ #include <stdlib.h> /* 3 */ /* 4 */ void f2(void) /* 5 */ { /* 6 */ int i; /* 7 */ int *a=malloc(sizeof(int)*8); /* 8 */ /* 9 */ for (i=0; i<16; i++) /* 10 */ a[i]=0; /* 11 */ } /* 12 */ /* 13 */ void f1(void) /* 14 */ { /* 15 */ f2(); /* 16 */ } /* 17 */ /* 18 */ int main (void) /* 19 */ { /* 20 */ f1(); /* 21 */ printf("done\n"); /* 22 */ /* 23 */ return 0; /* 24 */ }Once again, we're overrunning the end of the array. If you compile and run this program, however, you will most likely find that it appears to run successfully.
Because the memory for our array is allocated with malloc(), it comes from a different area of memory (called the heap) and does not interfere with the addresses on the stack. Furthermore, there is no segmentation fault when overwriting the end of the array because this memory is still part of the process. Programs cannot get memory from the operating system in bytes - they must request memroy in units of pages. A common page size for Linux on i386 is 4096 bytes. Thus, a small memory overrun is unlikely to cause a segmentation fault - just incorrect behavior. If you increase the the end value in the for() statement sufficiently (try increasing it from 16 to 2000), you will find that you do eventually get a segmentation fault.
How can you debug this? In order to find this sort of problem, it's generally necessary to use a helper library. This helper library will alter the behavior of malloc() calls, forcing each allocated block to be placed right at the end of a page, with an unallocated page after it, so that you will get a segmentation fault when you overrun it. One of the simplest and most common of these libraries is Bruce Perens' Electric Fence. Assuming you have it installed, all you need to do is recompile your program like so:
$ gcc -g -Wall -o foo foo.c -lefenceWhen you then run your program, you'll find that you get a segfault right at the time of memory overrun, and you can observe the stack trace and other information in your debugger as usual. You can read the libefence(3) manpage for more information about the library, such as how to make it check for overrunning the beginning of a memory block instead of the end.
That's it for now! Those three simple cases cover the majority of the memory problems you're likely to have in C or C++, and more complicated problems will often be variations on these. Now that you know what to look for, you should have a much easier time finding memory problems in your own programs.