A Comparative Study of Fuzzers and Static Analysis Tools for Finding Memory Unsafety in C and C++
Keno Hassler, Philipp Görz, Stephan Lipp, Thorsten Holz, Marcel Böhme
TL;DR
Memory-safety vulnerabilities in C/C++ remain a critical risk in modern software. This paper presents the first systematic cross-domain comparison of fuzzing and static analysis using the Magma benchmark (112 CVEs across seven programs) and 5 static analyzers plus 13 fuzzers, aiming to quantify true positives, overhead, and complementarity. Findings show fuzzers and static analyzers detect largely different bugs, with AFL++ and CodeQL leading their respective classes; combining them yields more comprehensive coverage, albeit with high manual effort for false positives and deduplication. The work provides practical guidance for maintainers, discusses integration into development workflows, and outlines future research directions to foster collaboration between fuzzing and static-analysis communities. Overall, the results suggest adopting a hybrid bug-finding strategy and progressing toward safer language designs to improve memory-safety proactively.
Abstract
Even today, over 70% of security vulnerabilities in critical software systems result from memory safety violations. To address this challenge, fuzzing and static analysis are widely used automated methods to discover such vulnerabilities. Fuzzing generates random program inputs to identify faults, while static analysis examines source code to detect potential vulnerabilities. Although these techniques share a common goal, they take fundamentally different approaches and have evolved largely independently. In this paper, we present an empirical analysis of five static analyzers and 13 fuzzers, applied to over 100 known security vulnerabilities in C/C++ programs. We measure the number of bug reports generated for each vulnerability to evaluate how the approaches differ and complement each other. Moreover, we randomly sample eight bug-containing functions, manually analyze all bug reports therein, and quantify false-positive rates. We also assess limits to bug discovery, ease of use, resource requirements, and integration into the development process. We find that both techniques discover different types of bugs, but there are clear winners for each. Developers should consider these tools depending on their specific workflow and usability requirements. Based on our findings, we propose future directions to foster collaboration between these research domains.
