Practical Methods for Fuzzing Real-World Systems
The current software ecosystem is exceptionally complex. A key defining feature of this complexity is the vast input space that software applications must process. This feature
inhibits fuzzing (an effective automated testing methodology) in uncovering deep bugs (i.e.,
bugs with complex preconditions). We improve the bug-finding capabilities of fuzzers by
reducing the input space that they have to explore. Our techniques incorporate domain
knowledge from the software under test. In this dissertation, we research how to incorporate
domain knowledge in different scenarios across a variety of software domains and test
objectives to perform deep bug discovery.
We start by focusing on language interpreters that form the backend of our web ecosystem.
Uncovering deep bugs in these interpreters requires synthesizing inputs that perform a
diverse set of semantic actions. To tackle this issue, we present Gramatron, a fuzzer that employs grammar automatons to speed up bug discovery. Then, we explore firmwares belonging to the rapidly growing IoT ecosystem which generally lack thorough testing. FirmFuzz infers the appropriate runtime state required to trigger vulnerabilities in these firmwares using the domain knowledge encoded in the user-facing network applications. Additionally, we showcase how our proposed strategy to incorporate domain knowledge is beneficial under alternative testing scenarios where a developer analyzes specific code locations, e.g., for patch testing. SieveFuzz leverages knowledge of targeted code locations to prohibit exploration of code regions and correspondingly parts of the input space that are irrelevant to reaching the target location. Finally, we move beyond the realm of memory-safety vulnerabilities and present how domain knowledge can be useful in uncovering logical bugs, specifically deserialization vulnerabilities in Java-based applications with Crystallizer. Crystallizer uses a hybrid analysis methodology to first infer an over-approximate set of possible payloads through static analysis (to constrain the search space). Then, it uses dynamic analysis to instantiate concrete payloads as a proof-of-concept of a deserialization vulnerability.
Throughout these four diverse areas we thoroughly demonstrate how incorporating domain
knowledge can massively improve bug finding capabilities. Our research has developed
tooling that not only outperforms the existing state-of-the-art in terms of efficient bug discovery (with speeds up to 117% faster), but has also uncovered 18 previously unknown bugs,
with five CVEs assigned.
- Doctor of Philosophy
- Computer Science
- West Lafayette