An Empirical Study of Static Analysis Tools for Secure Code Review

Wachiraphan Charoenwet; Patanamon Thongtanunam; Van-Thuan Pham; Christoph Treude

An Empirical Study of Static Analysis Tools for Secure Code Review

Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, Christoph Treude

TL;DR

This study evaluates the practical utility of five C/C++ SASTs for secure code reviews using real vulnerability-contributing commits from 92 projects (815 VCCs, 1,060 vulnerable functions, 319 exploitable vulnerabilities). It demonstrates that a single SAST can warn about vulnerabilities in the changed functions for about $52\%$ of VCCs, while combining tools raises detection to $78\%$ at the function level; however, $76\%$ of warnings in vulnerable functions are irrelevant and $22\%$ of VCCs remain undetected. The authors show that warning-based prioritization can improve precision by up to $12\%$, recall by up to $5.6\%$, and reduce Initial False Alarm by up to $13\%$ within a 25% code-review effort, with Warning Density often yielding the best gains. Computation times vary from about $20$s (Flawfinder) to roughly $45$ minutes (Cppcheck) and scale with project size, highlighting practical constraints for real-time code-review workflows. The work provides actionable guidance for practitioners and SAST developers, and releases an automated benchmarking framework and annotated dataset to support ongoing research in SAST-supported secure code reviews.

Abstract

Early identification of security issues in software development is vital to minimize their unanticipated impacts. Code review is a widely used manual analysis method that aims to uncover security issues along with other coding issues in software projects. While some studies suggest that automated static application security testing tools (SASTs) could enhance security issue identification, there is limited understanding of SAST's practical effectiveness in supporting secure code review. Moreover, most SAST studies rely on synthetic or fully vulnerable versions of the subject program, which may not accurately represent real-world code changes in the code review process. To address this gap, we study C/C++ SASTs using a dataset of actual code changes that contributed to exploitable vulnerabilities. Beyond SAST's effectiveness, we quantify potential benefits when changed functions are prioritized by SAST warnings. Our dataset comprises 319 real-world vulnerabilities from 815 vulnerability-contributing commits (VCCs) in 92 C and C++ projects. The result reveals that a single SAST can produce warnings in vulnerable functions of 52% of VCCs. Prioritizing changed functions with SAST warnings can improve accuracy (i.e., 12% of precision and 5.6% of recall) and reduce Initial False Alarm (lines of code in non-vulnerable functions inspected until the first vulnerable function) by 13%. Nevertheless, at least 76% of the warnings in vulnerable functions are irrelevant to the VCCs, and 22% of VCCs remain undetected due to limitations of SAST rules. Our findings highlight the benefits and the remaining gaps of SAST-supported secure code reviews and challenges that should be addressed in future work.

An Empirical Study of Static Analysis Tools for Secure Code Review

TL;DR

of VCCs, while combining tools raises detection to

at the function level; however,

of warnings in vulnerable functions are irrelevant and

of VCCs remain undetected. The authors show that warning-based prioritization can improve precision by up to

, recall by up to

, and reduce Initial False Alarm by up to

within a 25% code-review effort, with Warning Density often yielding the best gains. Computation times vary from about

s (Flawfinder) to roughly

minutes (Cppcheck) and scale with project size, highlighting practical constraints for real-time code-review workflows. The work provides actionable guidance for practitioners and SAST developers, and releases an automated benchmarking framework and annotated dataset to support ongoing research in SAST-supported secure code reviews.

Abstract

Paper Structure (26 sections, 7 figures, 7 tables)

This paper contains 26 sections, 7 figures, 7 tables.

Introduction
Background and Definitions
Study Design
Research Questions
Data preparation
Dataset
Selecting VCCs
Vulnerability Type Grouping
Identifying vulnerable changes location in VCCs
Execution
Studied Tools
Experiment Setup
Warning Type Grouping
Analyses & Results
Detection Effectiveness (RQ1)
...and 11 more sections

Figures (7)

Figure 1: Overview of our study approach.
Figure 2: An illustrative example of grouping CWE items (CWE-825 and CWE-457) to the CWE pillar (CWE-664).
Figure 3: Percentages of VCCs that the tools can detect in different scenarios.
Figure 4: A Venn diagram displaying VCCs for which the tools can produce warning(s) in the vulnerable functions (S5:1Fn-Any)
Figure 5: An illustration to depict changed functions without prioritization (above) and with warning-based prioritization (below) using a real VCC from libsndfile.
...and 2 more figures

An Empirical Study of Static Analysis Tools for Secure Code Review

TL;DR

Abstract

An Empirical Study of Static Analysis Tools for Secure Code Review

Authors

TL;DR

Abstract

Table of Contents

Figures (7)