Chat-like Asserts Prediction with the Support of Large Language Model
Han Wang, Han Hu, Chunyang Chen, Burak Turhan
TL;DR
This work tackles the challenge of generating meaningful Python assert statements for unit tests by introducing CLAP, a chat-like LLM-based approach that leverages persona-based prompting, Chain-of-Thought reasoning, and a feedback loop with a Python interpreter. It constructs a Python assert statement dataset from GitHub projects and demonstrates that CLAP achieves strong single-assert (≈64%) and multi-assert (≈62%) accuracy, outperforming state-of-the-art baselines. The method shows robust generalization across multiple LLMs, with practical implications for IDE and CI integration, and a qualitative study suggesting enhanced readability and potential bug fixes in real-world tests. The work also provides a public dataset and tool to empower the SE community, and outlines future directions for broader language support and end-to-end test-generation pipelines.
Abstract
Unit testing is an essential component of software testing, with the assert statements playing an important role in determining whether the tested function operates as expected. Although research has explored automated test case generation, generating meaningful assert statements remains an ongoing challenge. While several studies have investigated assert statement generation in Java, limited work addresses this task in popular dynamically-typed programming languages like Python. In this paper, we introduce Chat-like execution-based Asserts Prediction (\tool), a novel Large Language Model-based approach for generating meaningful assert statements for Python projects. \tool utilizes the persona, Chain-of-Thought, and one-shot learning techniques in the prompt design, and conducts rounds of communication with LLM and Python interpreter to generate meaningful assert statements. We also present a Python assert statement dataset mined from GitHub. Our evaluation demonstrates that \tool achieves 64.7\% accuracy for single assert statement generation and 62\% for overall assert statement generation, outperforming the existing approaches. We also analyze the mismatched assert statements, which may still share the same functionality and discuss the potential help \tool could offer to the automated Python unit test generation. The findings indicate that \tool has the potential to benefit the SE community through more practical usage scenarios.
