SAGA: Summarization-Guided Assert Statement Generation
Yuwei Zhang, Zhi Jin, Zejun Wang, Ying Xing, Ge Li
TL;DR
The paper tackles the challenge of generating meaningful assert statements for automated test case generation by incorporating developer-written focal-method summarization as guidance. It proposes SAGA, a CodeT5-based multi-modal framework that separately encodes test prefixes, focal methods, and natural-language summaries to produce accurate PL assertions. The authors construct CAPS-derived datasets and demonstrate that incorporating summarization yields significant improvements over state-of-the-art baselines in accuracy and text-quality metrics, supported by ablation and case analyses. The work advances practical automated testing by enabling more faithful assertion generation and suggests future directions such as static analysis augmentation and IDE integration to assist developers in writing tests more efficiently.
Abstract
Generating meaningful assert statements is one of the key challenges in automated test case generation, which requires understanding the intended functionality of the tested code. Recently, deep learning-based models have shown promise in improving the performance of assert statement generation. However, existing models only rely on the test prefixes along with their corresponding focal methods, yet ignore the developer-written summarization. Based on our observations, the summarization contents usually express the intended program behavior or contain parameters that will appear directly in the assert statement. Such information will help existing models address their current inability to accurately predict assert statements. This paper presents a novel summarization-guided approach for automatically generating assert statements. To derive generic representations for natural language (i.e., summarization) and programming language (i.e., test prefixes and focal methods), we leverage a pre-trained language model as the reference architecture and fine-tune it on the task of assert statement generation. To the best of our knowledge, the proposed approach makes the first attempt to leverage the summarization of focal methods as the guidance for making the generated assert statements more accurate. We demonstrate the effectiveness of our approach on two real-world datasets when compared with state-of-the-art models.
