FineWAVE: Fine-Grained Warning Verification of Bugs for Automated Static Analysis Tools
Han Liu, Jian Zhang, Cen Zhang, Xiaohan Zhang, Kaixuan Li, Sen Chen, Shang-Wei Lin, Yixiang Chen, Xinhua Li, Yang Liu
TL;DR
FineWAVE targets the pervasive problem of false positives in automated static analysis by performing fine-grained, bug-sensitive warning verification. It introduces a multi-modal LSTM-based model with cross-attention and warning-aware code slicing, augmented by warning information encoding and focal loss to handle severe class imbalance. The authors build BSWarnings, the largest dataset of bug-sensitive warnings to date, and demonstrate that FineWAVE significantly reduces false alarms ($F1\sim$ $97.79\%$ for false alarms) while enabling substantial confirmation of bug-sensitive warnings ($\sim$ $67.06\%$) and achieving strong real-world performance (≈$92\%$ warning filtration and 25 new bugs found). The approach outperforms eight baselines and remains practical for real-world integration, offering developers a precise, scalable tool to prioritize debugging efforts in large codebases.
Abstract
Automated Static Analysis Tools (ASATs) have evolved over time to assist in detecting bugs. However, the excessive false warnings can impede developers' productivity and confidence in the tools. Previous research efforts have explored learning-based methods to validate the reported warnings. Nevertheless, their coarse granularity, focusing on either long-term warnings or function-level alerts, which are insensitive to individual bugs. Also, they rely on manually crafted features or solely on source code semantics, which is inadequate for effective learning. In this paper, we propose FineWAVE, a learning-based approach that verifies bug-sensitive warnings at a fine-grained granularity. Specifically, we design a novel LSTM-based model that captures multi-modal semantics of source code and warnings from ASATs and highlights their correlations with cross-attention. To tackle the data scarcity of training and evaluation, we collected a large-scale dataset of 280,273 warnings. We conducted extensive experiments on the dataset to evaluate FineWAVE. The experimental results demonstrate the effectiveness of our approach, with an F1-score of 97.79\% for reducing false alarms and 67.06% for confirming actual warnings, significantly outperforming all baselines. Moreover, we have applied our FineWAVE to filter out about 92% warnings in four popular real-world projects, and found 25 new bugs with minimal manual effort.
