What You See Is What You Get: Attention-based Self-guided Automatic Unit Test Generation
Xin Yin, Chao Ni, Xiaodan Xu, Xiaohu Yang
TL;DR
AUGER tackles defects in software by coupling a high-confidence defect-detection stage with an attention-guided, defect-location-aware unit-test generation stage. The defect detector enhances robustness via adversarial and contrastive learning on a code encoder (UniXcoder), while the test generator uses an attention profiling mechanism to steer LLMs toward defect-relevant statements. Across Bears, Bugs.jar, and Defects4J, AUGER achieves superior defect-detection metrics and substantially increases the number of error-triggering tests with high precision, validated on real-world post-2023 projects. This WYSIWYG framework offers practical, explainable defect assessment and efficient, targeted testing for software quality assurance, with potential for broader adoption in real-world development workflows.
Abstract
Software defects heavily affect software's functionalities and may cause huge losses. Recently, many AI-based approaches have been proposed to detect defects, which can be divided into two categories: software defect prediction and automatic unit test generation. While these approaches have made great progress in software defect detection, they still have several limitations in practical application, including the low confidence of prediction models and the inefficiency of unit testing models. To address these limitations, we propose a WYSIWYG (i.e., What You See Is What You Get) approach: Attention-based Self-guided Automatic Unit Test GenERation (AUGER), which contains two stages: defect detection and error triggering. In the former stage, AUGER first detects the proneness of defects. Then, in the latter stage, it guides to generate unit tests for triggering such an error with the help of critical information obtained by the former stage. To evaluate the effectiveness of AUGER, we conduct a large-scale experiment by comparing with the state-of-the-art (SOTA) approaches on the widely used datasets (i.e., Bears, Bugs.jar, and Defects4J). AUGER makes great improvements by 4.7% to 35.3% and 17.7% to 40.4% in terms of F1-score and Precision in defect detection, and can trigger 23 to 84 more errors than SOTAs in unit test generation. Besides, we also conduct a further study to verify the generalization in practical usage by collecting a new dataset from real-world projects.
