The common wisdom is that if you want to fuzz data formats with such ornate grammars, you need to build an one-off, protocol-specific mutation engine with the appropriate syntax templates baked in. Of course, writing such code isn’t easy. In essence, you need to manually build a model precise enough so that the generated test cases almost always make sense to the targeted parser – but creative enough to trigger unintended behaviors in that codebase. It takes considerable experience and a fair amount of time to get it just right.
I was thinking about using afl-fuzz to reach some middle ground between the two worlds. I quickly realized that if you give the fuzzer a list of basic syntax tokens – say, the set of reserved keywords defined in the spec – the instrumentation-guided nature of the tool means that even if we just mindlessly clobber the tokens together, we will be able to distinguish between combinations that are nonsensical and ones that actually follow the rules of the underlying grammar and therefore trigger new states in the instrumented binary. By discarding that first class of inputs and refining the other, we could progressively construct more complex and meaningful syntax as we go.
Ideas are cheap, but when I implemented this one, it turned out to be a good bet. For example, I tried it against sqlite, with the fuzzer fed a collection of keywords grabbed from the project’s docs (-x testcases/_extras/sql/). Equipped with this knowledge, afl-fuzz quickly spewed out a range of valid if unusual statements, such as:
select round( -1)““;
select group_concat(DISTINCT+1) |1;
select length(?)in( hex(1)+++1,1);
select abs(+0+ hex(1)-NOT+1) t1;
select DISTINCT “Y”,”b”,(1)”Y”,”b”,(1);
select – (1)AND”a”,”b”;
select – “a”LIMIT- /* */ /* */- /* */ /* */-1;
select strftime(1, sqlite_source_id());
(It also found a couple of crashing bugs.)
All right, all right: grabbing keywords is much easier than specifying the underlying grammar, but it still takes some work. I’ve been wondering how to scratch that itch, too – and came up with a fairly simple algorithm that can help those who do not have the time or the inclination to construct a proper dictionary.
To explain the approach, it’s useful to rely on the example of a PNG file. The PNG format uses four-byte, human-readable magic values to indicate the beginning of a section, say:
89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 | .PNG........IHDR
00 00 00 20 00 00 00 20 02 03 00 00 00 0e 14 92 | ................
The algorithm in question can identify “IHDR” as a syntax token by piggybacking on top of the deterministic, sequential bit flips that are already being performed by afl-fuzz across the entire file. It works by identifying runs of bytes that satisfy a simple property: that flipping them triggers an execution path that is distinct from the product of flipping stuff in the neighboring regions, yet consistent across the entire sequence of bytes.
This signal strongly implies that touching any of the affected bytes causes the failure of an underlying atomic check, such as header.magic_value == 0xDEADBEEF or strcmp(name, “Set-Cookie”). When such a behavior is detected, the entire blob of data is added to the dictionary, to be randomly recombined with other dictionary tokens later on.
This second trick is not a substitute for a proper, hand-crafted list of keywords; for one, it will only know about the syntax tokens that were present in the input files, or could be synthesized easily. It will also not do much when pitted against optimized, tree-based parsers that do not perform atomic string comparisons. (The fuzzer itself can often clear that last obstacle anyway, but the process will be slow.)
Well, that’s it. If you want to try out the new features, click here and let me know how it goes!