TestForge: Feedback-Driven, Agentic Test Suite Generation

Kush Jain; Claire Le Goues

TestForge: Feedback-Driven, Agentic Test Suite Generation

Kush Jain, Claire Le Goues

TL;DR

TestForge introduces an agentic, feedback-driven framework for automated unit-test generation that iteratively refines a zero-shot test suite using execution and coverage feedback at the file level. By operating within OpenHands and using a cost-aware loop, TestForge achieves state-of-the-art metrics on the TestGenEval benchmark (pass@1 around 84%, line coverage ~44%, mutation score ~34%), while maintaining low cost (~$0.63 per file). The approach outperforms classical genetic-programming baselines and one-shot LLM baselines, and it yields more readable and maintainable tests than prior methods. The work demonstrates how dynamic feedback and planning can scale high-quality test generation to large, real-world codebases and provides reproducible benchmarks through OpenHands integration.

Abstract

Automated test generation holds great promise for alleviating the burdens of manual test creation. However, existing search-based techniques compromise on test readability, while LLM-based approaches are prohibitively expensive in practice. We present TestForge, an agentic unit testing framework designed to cost-effectively generate high-quality test suites for real-world code. Our key insight is to reframe LLM-based test generation as an iterative process. TestForge thus begins with tests generated via zero-shot prompting, and then continuously refines those tests based on feedback from test executions and coverage reports. We evaluate TestForge on TestGenEval, a real world unit test generation benchmark sourced from 11 large scale open source repositories; we show that TestForge achieves a pass@1 rate of 84.3%, 44.4% line coverage and 33.8% mutation score on average, outperforming prior classical approaches and a one-iteration LLM-based baseline. TestForge produces more natural and understandable tests compared to state-of-the-art search-based techniques, and offers substantial cost savings over LLM-based techniques (at $0.63 per file). Finally, we release a version of TestGenEval integrated with the OpenHands platform, a popular open-source framework featuring a diverse set of software engineering agents and agentic benchmarks, for future extension and development.

TestForge: Feedback-Driven, Agentic Test Suite Generation

TL;DR

Abstract

TestForge: Feedback-Driven, Agentic Test Suite Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)