Static Application Security Testing (SAST) Tools for Smart Contracts: How Far Are We?
Kaixuan Li, Yue Xue, Sen Chen, Han Liu, Kairan Sun, Ming Hu, Haijun Wang, Yang Liu, Yixiang Chen
TL;DR
This work tackles the challenge of objectively evaluating SAST tools for smart contracts by building an up-to-date vulnerability taxonomy with 45 types and a large, diverse benchmark covering 40 types. It systematically compares 8 SAST tools (including one commercial CSA) on 788 vulnerable contracts (10,394 ground-truth vulnerabilities) across 8,981 total contracts, revealing that roughly half of vulnerabilities go undetected and precision remains below 10% for many tools. The study further shows that combining tools can drastically improve recall (up to 91.5%) at the expense of a large number of false positives, while majority-voting can improve precision but reduces recall. Practical implications are drawn for tool developers, researchers, and practitioners, stressing improved compilation robustness, refined detection semantics, and hybrid tool integration to strengthen smart-contract security.
Abstract
In recent years, the importance of smart contract security has been heightened by the increasing number of attacks against them. To address this issue, a multitude of static application security testing (SAST) tools have been proposed for detecting vulnerabilities in smart contracts. However, objectively comparing these tools to determine their effectiveness remains challenging. Existing studies often fall short due to the taxonomies and benchmarks only covering a coarse and potentially outdated set of vulnerability types, which leads to evaluations that are not entirely comprehensive and may display bias. In this paper, we fill this gap by proposing an up-to-date and fine-grained taxonomy that includes 45 unique vulnerability types for smart contracts. Taking it as a baseline, we develop an extensive benchmark that covers 40 distinct types and includes a diverse range of code characteristics, vulnerability patterns, and application scenarios. Based on them, we evaluated 8 SAST tools using this benchmark, which comprises 788 smart contract files and 10,394 vulnerabilities. Our results reveal that the existing SAST tools fail to detect around 50% of vulnerabilities in our benchmark and suffer from high false positives, with precision not surpassing 10%. We also discover that by combining the results of multiple tools, the false negative rate can be reduced effectively, at the expense of flagging 36.77 percentage points more functions. Nevertheless, many vulnerabilities, especially those beyond Access Control and Reentrancy vulnerabilities, remain undetected. We finally highlight the valuable insights from our study, hoping to provide guidance on tool development, enhancement, evaluation, and selection for developers, researchers, and practitioners.
