Table of Contents
Fetching ...

TestDecision: Sequential Test Suite Generation via Greedy Optimization and Reinforcement Learning

Guoqing Wang, Chengran Yang, Xiaoxuan Zhou, Zeyu Sun, Bo Wang, David Lo, Dan Hao

Abstract

With the rapid evolution of LLMs, automated software testing is witnessing a paradigm shift. While proprietary models like GPT-4o demonstrate impressive capabilities, their high deployment costs and data privacy concerns make open-source LLMs the practical imperative for many academic and industrial scenarios. In the field of automated test generation, it has evolved to iterative workflows to construct test suites based on LLMs. When utilizing open-source LLMs, we empirically observe they lack a suite-level perspective, suffering from structural myopia-failing to generate new tests with large marginal gain based on the current covered status. In this paper, from the perspective of sequences, we formalize test suite generation as a MDP and demonstrate that its objective exhibits monotone submodularity, which enables an effective relaxation of this NP-hard global optimization into a tractable step-wise greedy procedure. Guided by this insight, we propose TestDecision, which transforms LLMs into neural greedy experts. TestDecision consists of two synergistic components: (1) an inference framework which implements test suite construction following a step-wise greedy strategy; and (2) a training pipeline of reinforcement learning which equips the base LLM with sequential test generation ability to maximize marginal gain. Comprehensive evaluations on the ULT benchmark demonstrate that TestDecision significantly outperforms existing advanced methods. It brings an improvement between 38.15-52.37% in branch coverage and 298.22-558.88% in execution pass rate over all base models, achieving a comparable performance on 7B backbone with a much larger proprietary LLM GPT-5.2. Furthermore, TestDecision can find 58.43-95.45% more bugs than vanilla base LLMs and exhibit superior generalization on LiveCodeBench, proving its capability to construct high-quality test suites.

TestDecision: Sequential Test Suite Generation via Greedy Optimization and Reinforcement Learning

Abstract

With the rapid evolution of LLMs, automated software testing is witnessing a paradigm shift. While proprietary models like GPT-4o demonstrate impressive capabilities, their high deployment costs and data privacy concerns make open-source LLMs the practical imperative for many academic and industrial scenarios. In the field of automated test generation, it has evolved to iterative workflows to construct test suites based on LLMs. When utilizing open-source LLMs, we empirically observe they lack a suite-level perspective, suffering from structural myopia-failing to generate new tests with large marginal gain based on the current covered status. In this paper, from the perspective of sequences, we formalize test suite generation as a MDP and demonstrate that its objective exhibits monotone submodularity, which enables an effective relaxation of this NP-hard global optimization into a tractable step-wise greedy procedure. Guided by this insight, we propose TestDecision, which transforms LLMs into neural greedy experts. TestDecision consists of two synergistic components: (1) an inference framework which implements test suite construction following a step-wise greedy strategy; and (2) a training pipeline of reinforcement learning which equips the base LLM with sequential test generation ability to maximize marginal gain. Comprehensive evaluations on the ULT benchmark demonstrate that TestDecision significantly outperforms existing advanced methods. It brings an improvement between 38.15-52.37% in branch coverage and 298.22-558.88% in execution pass rate over all base models, achieving a comparable performance on 7B backbone with a much larger proprietary LLM GPT-5.2. Furthermore, TestDecision can find 58.43-95.45% more bugs than vanilla base LLMs and exhibit superior generalization on LiveCodeBench, proving its capability to construct high-quality test suites.

Paper Structure

This paper contains 34 sections, 1 theorem, 7 equations, 4 figures, 5 tables.

Key Result

Theorem 1

Let $\pi_{greedy}$ be a policy that selects the action maximizing the immediate marginal gain at each step $t$: $a_t = \mathop{\arg\max}_{a \in \mathcal{U}} \left( F(S_{t-1} \cup \{a\}) - F(S_{t-1}) \right)$ Under Assumption ass:independence, the test suite $S_K$ generated by $\pi_{greedy}$ guarante

Figures (4)

  • Figure 1: Illustrative example of the sequential dependency. Test C is functionally valid, but its contribution depends entirely on whether Tests A and B already exist in the suite.
  • Figure 2: Coverage growth with respect to test suite size ($k$). All models exhibit a rapid plateau, characteristic of diminishing returns. Notably, the iter-guided lines do not distinctly separate from the iter-blind lines.
  • Figure 3: The overview of TestDecision. The framework operates as an iterative generation loop. The LLM acts as a greedy policy generator, observing the code state augmented with checked markers. The environment executes the generated test, checks its correctness, and updates the state for the next step, ensuring the LLM always targets the remaining checked frontier.
  • Figure 4: Step-wise performance trajectory. TestDecision exhibits a steeper growth curve compared to baselines.

Theorems & Definitions (4)

  • Definition 1: Test Generation MDP
  • Remark 1: Hardness
  • Theorem 1: Approximation Guarantee
  • proof