Whitebox-Fuzz-Reading

Whitebox Fuzz Testing

This paper contributes a novel fuzz testing mechanism called SAGE (Scalable, Automated, Guided Execution). It covers more code than traditional black box testing by intelligently keeping track of conditionals and using a constraint solver to enumerate relevant inputs. While a full enumeration would be completely exhaustive (and hence result in a formal verification), in practice the permutations are too large to fully enumerate. For this reason, much of the paper is devoted to finding the "sweet spot" for the symbolic execution algorithm.

Paper Review

Problem: Traditional fuzz testing can miss a lot of code behind conditional statements that have a very low probability of executing when fed random input (e.g. a code branch behind x == 3 for x in [i32::MIN, i32::MAX]). The goal of this paper is to increase the code coverage of these traditional approaches to uncover more bugs.
Approach: Instead of using random mutations as input, this paper proposes an algorithm SAGE that starts with well-formed input, gathers constraints from the various conditional code branches found via symbolic execution, and then uses a constraint solver to produce new inputs that strategically exercise various control paths, thus increasing code coverage.
Conclusions: One conclusion drawn is that this research effort was indeed successful; a number of bugs were found across a variety of applications that were missed by previous blackbox fuzz testing rounds. While many of these bugs could have been found via blackbox fuzzing with grammars that generate the appropriate test cases, such grammars require effort and must be done ad-hoc per test. Another conclusion drawn is that SAGE was most successful when operating on multiple seed files, and just running through a limited number of generations for each one.
New ideas: 1. This paper's main new idea is a "Generational Search" algorithm that improves over the state of the art in dynamic test generation. 2. Subsequently, there are new ideas found as a result of the new SAGE system, e.g.: well-formed input files perform much better than bogus ones, a variety of input files are necessary for finding a variety of bugs, and almost all bugs happen within shallow generations.
Improvements: 1. Honestly, one improvement would be the title. While reading this paper I was (quite reasonably) under the impression that the ideas combined dynamic test generation with fuzz testing. Only in Section 2.3 is the term "whitebox fuzzing" justified and it is simply because importance is placed on the first input, as is done in blackbox fuzz testing. The cynic in me thinks that the reasoning is rather flimsy, and that the paper is capitalizing on fuzzing as a buzzword. 2. It seems like the authors mostly describe grammars as a drawback to blackbox fuzzing, since they are application specific and not quite as scalable. However, it should be possible to use this technique to generate better input files for SAGE. I think the paper could be improved if they experimented with this. It could also be considered a new direction for a future paper.
New directions: 1. One important direction following this paper would simply be an extension of the same experiments. As mentioned in the conclusion, the results drawn are only from a small sample of experiments, as SAGE has only been used on a small number of applications thus far. So, further experiments are certainly worthwhile, to confirm or expand on the conclusions drawn in these initial experiments. 2. Another direction is to further improve the precision of the symbolic execution method.