WITNESS: A lightweight and practical approach to fine-grained predictive mutation testing
Zeyu Lu, Peng Zhang, Chun Yong Chong, Shan Gao, Yibiao Yang, Yanhui Li, Lin Chen, Yuming Zhou
TL;DR
WITNESS tackles the practicality gap in fine-grained predictive mutation testing by using lightweight classical ML to predict the kill matrix for both inside- and outside-method mutants. It leverages 21 engineered features from source code, changes, and test cases, and trains Random Forest and LightGBM with an ensemble to produce the kill matrix and support downstream tasks such as test-case prioritization (SAPIENT). Empirically, WITNESS achieves state-of-the-art predictive performance on Defects4J with drastic efficiency gains over deep-learning baselines, and the feature analysis highlights the crucial role of pre/post-mutation change information. The approach broadens applicability, enables real-world adoption, and offers a practical baseline for future PMT research and test-suite augmentation.
Abstract
Existing fine-grained predictive mutation testing studies predominantly rely on deep learning, which faces two critical limitations in practice: (1) Exorbitant computational costs. The deep learning models adopted in these studies demand significant computational resources for training and inference acceleration. This introduces high costs and undermines the cost-reduction goal of predictive mutation testing. (2) Constrained applicability. Although modern mutation testing tools generate mutants both inside and outside methods, current fine-grained predictive mutation testing approaches handle only inside-method mutants. As a result, they cannot predict outside-method mutants, limiting their applicability in real-world scenarios. We propose WITNESS, a new fine-grained predictive mutation testing approach. WITNESS adopts a twofold design: (1) With collected features from both inside-method and outside-method mutants, WITNESS is suitable for all generated mutants. (2) Instead of using computationally expensive deep learning, WITNESS employs lightweight classical machine learning models for training and prediction. This makes it more cost-effective and enabling straightforward explanations of the decision-making processes behind the adopted models. Evaluations on Defects4J projects show that WITNESS consistently achieves state-of-the-art predictive performance across different scenarios. Additionally, WITNESS significantly enhances the efficiency of kill matrix prediction. Post-hoc analysis reveals that features incorporating information from before and after the mutation are the most important among those used in WITNESS. Test case prioritization based on the predicted kill matrix shows that WITNESS delivers results much closer to those obtained by using the actual kill matrix, outperforming baseline approaches.
