TestWeaver: Execution-aware, Feedback-driven Regression Testing Generation with Large Language Models
Cuong Chi Le, Cuong Duc Van, Tung Duy Vu, Thai Minh Pham Vu, Hoang Nhat Phan, Huy Nhat Phan, Tien N. Nguyen
TL;DR
TestWeaver tackles the coverage plateau in LLM-driven regression test generation by integrating lightweight program analysis into the prompting loop. It combines backward dynamic slicing to produce focused code slices, a closest-test retrieval strategy to ground the LLM in relevant execution context, and in-line execution annotations to reveal runtime states. Empirically, TestWeaver achieves higher line ($\$68\%$) and branch ($\$54\%$) coverage on 35 Python projects in the CM suite than state-of-the-art baselines, while also reducing token costs and accelerating convergence to peak coverage (about $\$2.76\times$ faster). The results validate a new direction that blends static/dynamic program analysis with LLMs to improve test generation efficiency, scalability, and effectiveness, with implications for broader program-analysis–assisted AI tooling.
Abstract
While recent advances in large language models (LLMs) have shown promise in automating test generation for regression testing, they often suffer from limited reasoning about program execution, resulting in stagnated coverage growth - a phenomenon known as the coverage plateau. This paper presents TestWeaver, a novel LLM-based approach that integrates lightweight program analysis to create a focused execution context that assists LLMs in better test generation. TestWeaver strategically chooses the following components to overcome LLMs' limited reasoning on complex execution: (1) it reduces hallucinations and improves focus by supplying the LLM with the backward slice from the target line instead of a full program context; (2) it identifies and incorporates close test cases - those that share control-flow similarities with the path to the target line - to provide focused execution context within the LLM's context window; and (3) it enhances LLM's reasoning with execution in-line annotations that encode variable states as comments along the executed path. By equipping LLMs with these targeted and contextualized inputs, it improves coverage-guided test generation and mitigates redundant explorations. Empirical results show that TestWeaver accelerates code coverage growth and generates more effective test cases than the state-of-the-art approaches.
