Static Code Analyzer Recommendation via Preference Mining
Xiuting Ge, Chunrong Fang, Xuanye Li, Ye Shang, Mengyao Zhang, Ya Pan
TL;DR
This work tackles the impracticality of applying all static code analyzers (SCAs) to a project by proposing a practical SCA recommendation approach based on preference mining. It first evaluates SCA effectiveness across 213 large-scale Java projects using a careful warning labeling and alignment pipeline, then mines the relationship between project characteristics and optimal SCAs via feature extraction, selection, and visualization. A multi-label classifier is built to predict the best SCA for a given project, with Random Forests achieving the strongest performance and a beta parameter $\beta$ controlling the balance between precision and recall. Across extensive experiments, the proposed model consistently outperforms four baseline strategies, suggesting significant practical impact for reducing warning overload while maintaining defect detection capability.
Abstract
Static Code Analyzers (SCAs) have played a critical role in software quality assurance. However, SCAs with various static analysis techniques suffer from different levels of false positives and false negatives, thereby yielding the varying performance in SCAs. To detect more defects in a given project, it is a possible way to use more available SCAs for scanning this project. Due to producing unacceptable costs and overpowering warnings, invoking all available SCAs for a given project is impractical in real scenarios. To address the above problem, we are the first to propose a practical SCA recommendation approach via preference mining, which aims to select the most effective SCA for a given project. Specifically, our approach performs the SCA effectiveness evaluation to obtain the correspondingly optimal SCAs on projects under test. Subsequently, our approach performs the SCA preference mining via the project characteristics, thereby analyzing the intrinsic relation between projects under test and the correspondingly optimal SCAs. Finally, our approach constructs the SCA recommendation model based on the evaluation data and the associated analysis findings. We conduct the experimental evaluation on three popular SCAs as well as 213 open-source and large-scale projects. The results present that our constructed SCA recommendation model outperforms four typical baselines by 2 ~ 11 times.
