An Empirical Study of Static Analysis Tools for Secure Code Review
Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, Christoph Treude
TL;DR
This study evaluates the practical utility of five C/C++ SASTs for secure code reviews using real vulnerability-contributing commits from 92 projects (815 VCCs, 1,060 vulnerable functions, 319 exploitable vulnerabilities). It demonstrates that a single SAST can warn about vulnerabilities in the changed functions for about $52\%$ of VCCs, while combining tools raises detection to $78\%$ at the function level; however, $76\%$ of warnings in vulnerable functions are irrelevant and $22\%$ of VCCs remain undetected. The authors show that warning-based prioritization can improve precision by up to $12\%$, recall by up to $5.6\%$, and reduce Initial False Alarm by up to $13\%$ within a 25% code-review effort, with Warning Density often yielding the best gains. Computation times vary from about $20$s (Flawfinder) to roughly $45$ minutes (Cppcheck) and scale with project size, highlighting practical constraints for real-time code-review workflows. The work provides actionable guidance for practitioners and SAST developers, and releases an automated benchmarking framework and annotated dataset to support ongoing research in SAST-supported secure code reviews.
Abstract
Early identification of security issues in software development is vital to minimize their unanticipated impacts. Code review is a widely used manual analysis method that aims to uncover security issues along with other coding issues in software projects. While some studies suggest that automated static application security testing tools (SASTs) could enhance security issue identification, there is limited understanding of SAST's practical effectiveness in supporting secure code review. Moreover, most SAST studies rely on synthetic or fully vulnerable versions of the subject program, which may not accurately represent real-world code changes in the code review process. To address this gap, we study C/C++ SASTs using a dataset of actual code changes that contributed to exploitable vulnerabilities. Beyond SAST's effectiveness, we quantify potential benefits when changed functions are prioritized by SAST warnings. Our dataset comprises 319 real-world vulnerabilities from 815 vulnerability-contributing commits (VCCs) in 92 C and C++ projects. The result reveals that a single SAST can produce warnings in vulnerable functions of 52% of VCCs. Prioritizing changed functions with SAST warnings can improve accuracy (i.e., 12% of precision and 5.6% of recall) and reduce Initial False Alarm (lines of code in non-vulnerable functions inspected until the first vulnerable function) by 13%. Nevertheless, at least 76% of the warnings in vulnerable functions are irrelevant to the VCCs, and 22% of VCCs remain undetected due to limitations of SAST rules. Our findings highlight the benefits and the remaining gaps of SAST-supported secure code reviews and challenges that should be addressed in future work.
