Navigating the Labyrinth: Path-Sensitive Unit Test Generation with Large Language Models
Dianshu Liao, Xin Yin, Shidong Pan, Chao Ni, Zhenchang Xing, Xiaoyu Sun
TL;DR
JUnitGenie tackles the challenge of path-sensitive unit test generation for Java by extracting rich code-context (types, control-flow, data dependencies), distilling it into path-specific prompts, and guiding LLMs through an iterative refinement loop to produce high-coverage, executable tests. It combines a Code Knowledge Base with structured prompting and a generate–validate–repair workflow to manage compilation and runtime errors. In large-scale evaluation over 2,258 focal methods from Defects4J projects, JUnitGenie achieves a 69.88% valid-test success rate and consistently outperforms both heuristic and prior LLM-based baselines in branch and line coverage, while uncovering real-world bugs fixed by developers. The results demonstrate that targeted, path-aware context plus refinement yields robust test generation across models, including strong performance with open-source LLMs, highlighting practical impact for software testing at scale.
Abstract
Unit testing is essential for software quality assurance, yet writing and maintaining tests remains time-consuming and error-prone. To address this challenge, researchers have proposed various techniques for automating unit test generation, including traditional heuristic-based methods and more recent approaches that leverage large language models (LLMs). However, these existing approaches are inherently path-insensitive because they rely on fixed heuristics or limited contextual information and fail to reason about deep control-flow structures. As a result, they often struggle to achieve adequate coverage, particularly for deep or complex execution paths. In this work, we present a path-sensitive framework, JUnitGenie, to fill this gap by combining code knowledge with the semantic capabilities of LLMs in guiding context-aware unit test generation. After extracting code knowledge from Java projects, JUnitGenie distills this knowledge into structured prompts to guide the generation of high-coverage unit tests. We evaluate JUnitGenie on 2,258 complex focal methods from ten real-world Java projects. The results show that JUnitGenie generates valid tests and improves branch and line coverage by 29.60% and 31.00% on average over both heuristic and LLM-based baselines. We further demonstrate that the generated test cases can uncover real-world bugs, which were later confirmed and fixed by developers.
