Table of Contents
Fetching ...

Actionable Warning Is Not Enough: Recommending Valid Actionable Warnings with Weak Supervision

Zhipeng Xue, Zhipeng Gao, Tongtong Xu, Xing Hu, Xin Xia, Shanping Li

TL;DR

This work tackles the problem of high false alarm rates in static analysis by reframing actionable warnings as candidates with varying likelihoods of being real bugs. It introduces ACWRecommender, a two-stage framework that first collects actionable warnings from Git histories, then applies a weak supervision strategy combining semantic and structural signals to label warnings as Very/Likely To be Bugs (VTB/LTB) or Unlikely To be Bugs (UTB). The model warms up a detector with UniXcoder and then fine-tunes a reranker to prioritize AWHBs, achieving superior nDCG and MRR over baselines and validating results in real-world GitHub projects with developer confirmations. A large replication package and extensive cross-tool evaluation demonstrate practical applicability and generalizability across warning types and static-analysis tools, highlighting the approach’s potential to reduce developer effort in triaging SA warnings.

Abstract

The use of static analysis tools has gained increasing popularity among developers in the last few years. However, the widespread adoption of static analysis tools is hindered by their high false alarm rates. Previous studies have introduced the concept of actionable warnings and built a machine-learning method to distinguish actionable warnings from false alarms. However, according to our empirical observation, the current assumption used for actionable warning(s) collection is rather shaky and inaccurate, leading to a large number of invalid actionable warnings. To address this problem, in this study, we build the first large actionable warning dataset by mining 68,274 reversions from Top-500 GitHub C repositories, we then take one step further by assigning each actionable warning a weak label regarding its likelihood of being a real bug. Following that, we propose a two-stage framework called ACWRecommender to automatically recommend the actionable warnings with high probability to be real bugs (AWHB). Our approach warms up the pre-trained model UniXcoder by identifying actionable warnings task (coarse-grained detection stage) and rerank AWHB to the top by weakly supervised learning (fine-grained reranking stage). Experimental results show that our proposed model outperforms several baselines by a large margin in terms of nDCG and MRR for AWHB recommendation. Moreover, we ran our tool on 6 randomly selected projects and manually checked the top-ranked warnings from 2,197 reported warnings, we reported top-10 recommended warnings to developers, 27 of them were already confirmed by developers as real bugs. Developers can quickly find real bugs among the massive amount of reported warnings, which verifies the practical usage of our tool.

Actionable Warning Is Not Enough: Recommending Valid Actionable Warnings with Weak Supervision

TL;DR

This work tackles the problem of high false alarm rates in static analysis by reframing actionable warnings as candidates with varying likelihoods of being real bugs. It introduces ACWRecommender, a two-stage framework that first collects actionable warnings from Git histories, then applies a weak supervision strategy combining semantic and structural signals to label warnings as Very/Likely To be Bugs (VTB/LTB) or Unlikely To be Bugs (UTB). The model warms up a detector with UniXcoder and then fine-tunes a reranker to prioritize AWHBs, achieving superior nDCG and MRR over baselines and validating results in real-world GitHub projects with developer confirmations. A large replication package and extensive cross-tool evaluation demonstrate practical applicability and generalizability across warning types and static-analysis tools, highlighting the approach’s potential to reduce developer effort in triaging SA warnings.

Abstract

The use of static analysis tools has gained increasing popularity among developers in the last few years. However, the widespread adoption of static analysis tools is hindered by their high false alarm rates. Previous studies have introduced the concept of actionable warnings and built a machine-learning method to distinguish actionable warnings from false alarms. However, according to our empirical observation, the current assumption used for actionable warning(s) collection is rather shaky and inaccurate, leading to a large number of invalid actionable warnings. To address this problem, in this study, we build the first large actionable warning dataset by mining 68,274 reversions from Top-500 GitHub C repositories, we then take one step further by assigning each actionable warning a weak label regarding its likelihood of being a real bug. Following that, we propose a two-stage framework called ACWRecommender to automatically recommend the actionable warnings with high probability to be real bugs (AWHB). Our approach warms up the pre-trained model UniXcoder by identifying actionable warnings task (coarse-grained detection stage) and rerank AWHB to the top by weakly supervised learning (fine-grained reranking stage). Experimental results show that our proposed model outperforms several baselines by a large margin in terms of nDCG and MRR for AWHB recommendation. Moreover, we ran our tool on 6 randomly selected projects and manually checked the top-ranked warnings from 2,197 reported warnings, we reported top-10 recommended warnings to developers, 27 of them were already confirmed by developers as real bugs. Developers can quickly find real bugs among the massive amount of reported warnings, which verifies the practical usage of our tool.

Paper Structure

This paper contains 27 sections, 7 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Overview of Our Approach
  • Figure 2: Example of the Git Commit Graph
  • Figure 3: Recall@Top-K% curve