As I wrote in the last chapter, in this article I will only explain how to fuzz when there’s access to the source code using AFL. To demonstrate, I took an old open-source program that I found on GitHub called ccalc. As the name suggests, it is a simple calculator written in C. I chose it because it was likely that fuzzing would easily find many crashes in it, and as you will see, this hypothesis turned out to be correct.
First things first, we need to determine a target. I decided to fuzz the main() function, instead of finding some specific function to fuzz. The reasoning is that the program parses the raw input in a series of functions before it hands over processed input to the functions that actually do the calculations themselves. Since the program sieves bad input during parsing, it isn’t fair to fuzz any specific function.
The original main() looks more or less like this (I abbreviated it a little):
Generally, it contains two branches: it either gets input through argc, argv, or through standard input (stdin). I’m not sure why, but the creator of AFL explicitly recommends not to fuzz through argc, argv, and so does all other sources online. So I changed it a little: I first removed the section that deals with input coming from argc, argv, and then I tried to optimize it as far as possible. After all, each time the fuzzer tries another input, this function is executed. Small can therefore Inefficiencies can lead to serious slowdowns. Here’s the end-result:
Notice the loop I added. It instructs AFL to use persistent mode. In AFL’s default mode, each time AFL runs the programs, it uses the fork() syscall, to create a new subprocess. This syscall has some serious overhead, which seriously slows down the whole fuzzing process. When persistent mode is used, the fuzzer uses the same process time and time again. The only requirement is to wipe all variables and buffers each time the loop runs.
I called the file with the modified function harness.c, and in order to compile the program with this file and not with main.c, I changed the automake file form:
It is now time to fuzz the program. I ran automake in order to generate the make files, and then make itself with the command:
So that the program will compile with AFL instrumentation.
The only thing left to do now is to create the initial input folder for AFL, which I called ‘in’, and the output folder ‘out’. In the input folder, one puts simple text files (without a format) that contain legitimate input, like “20/2” or “5*4”. Since we want to run a couple of AFL processes (each process running on its own CPU core), we create a separate input folder for each process.
In order to fuzz, one simply runs the command:
For the first process, the second being
And so on and so forth.
As you can see in the following screenshot, AFL has a GUI that gives important information about the fuzzing process. For instance, measurements of the coverage are given under map coverage, and under findings in depth the GUI also provides the number of crashes and related information.
After a sufficient amount (the exact number is up to you) of crashes have been found, usually after a long period of time without a new unique crash, you can terminate the process. For every process of AFL you run, a separate folder will be generated in the output folder:
Inside these folders, the data is arranged like this:
It’s important to explain what information each of these contains:
- plot_data – puts the information in form conductive to the generation of a plot.
- fuzzer_stats – statistics about the fuzzing process.
- fuzz_bitmap – a bitmap wherein each byte corresponds to a branch of the program.
“AFL maintains a “fuzz bitmap”, with each byte within the bitmap representing a count of the number of times a particular branch within the fuzzed program has been taken. AFL does not perform a one-to-one mapping between a particular branch and a byte within the bitmap. Instead, AFL’s embedded instrumentation places a random two-byte constant ID into each branch. Whenever execution reaches an instrumented branch, AFL performs an XOR of the new branch’s ID and the last branch ID seen prior to arriving at the new branch. This captures both the current branch and the unique path taken to reach it (such as when the same function is called from multiple locations in the code). AFL then applies a hashing function to the XOR’d value to determine which entry in the bitmap represents that branch combination. Whenever a particular branch combination is exercised, the appropriate byte is incremented within the bitmap.”
- cmdline – the command that was supplied to AFL. In practice the program’s name.
- .cur_input – the current input.
- queue – all input that was tried up to now.
- crashes and hangs – the results. Each file in these folders contains input that caused crashes or hangs. Specified in the name of each file is the kernel signal of the crash (not relevant in hangs, of course), the ID of the input used by AFL to create the input that caused the crash, the time passed since AFL began to run, and the mutation used to generate the input from its seed.