Clarifying Semantics of In-Context Examples for Unit Test Generation
Chen Yang, Lin Yang, Ziqi Wang, Dong Wang, Jianyi Zhou, Junjie Chen
TL;DR
CLAST addresses the problem of semantic clarity in in-context unit test examples used for LLM-based test generation. It introduces a two-stage refinement: test purification to isolate single-scenario tests and textual clarity enhancement via prompting and program analysis to generate clearer comments and identifiers while preserving original test effectiveness. Across seven Java projects, CLAST outperformed UTgen in preserving test effectiveness and significantly improved semantic clarity; when used to refine in-context examples, CLAST boosted CSR/PR/Cov for RAGGen and TELPA. The work demonstrates practical impact for improving LLM-based software testing and suggests avenues for future integration with larger models and runtime information.
Abstract
Recent advances in large language models (LLMs) have enabled promising performance in unit test generation through in-context learning (ICL). However, the quality of in-context examples significantly influences the effectiveness of generated tests-poorly structured or semantically unclear test examples often lead to suboptimal outputs. In this paper, we propose CLAST, a novel technique that systematically refines unit tests to improve their semantic clarity, thereby enhancing their utility as in-context examples. The approach decomposes complex tests into logically clearer ones and improves semantic clarity through a combination of program analysis and LLM-based rewriting. We evaluated CLAST on four open-source and three industrial projects. The results demonstrate that CLAST largely outperforms UTgen, the state-of-the-art refinement technique, in both preserving test effectiveness and enhancing semantic clarity. Specifically, CLAST fully retains the original effectiveness of unit tests, while UTgen reduces compilation success rate (CSR), pass rate (PR), test coverage (Cov), and mutation score (MS) by an average of 12.90%, 35.82%, 4.65%, and 5.07%, respectively. Over 85.33% of participants in our user study preferred the semantic clarity of CLAST-refined tests. Notably, incorporating CLAST-refined tests as examples effectively improves ICL-based unit test generation approaches such as RAGGen and TELPA, resulting in an average increase of 25.97% in CSR, 28.22% in PR, and 45.99% in Cov for generated tests, compared to incorporating UTgen-refined tests. The insights from the follow-up user study not only reinforce CLAST's potential impact in software testing practice but also illuminate avenues for future research.
