Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning

Yinjie Wang; Ling Yang; Ye Tian; Ke Shen; Mengdi Wang

Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning

Yinjie Wang, Ling Yang, Ye Tian, Ke Shen, Mengdi Wang

TL;DR

This paper introduces CURE, a co-evolving reinforcement learning framework that jointly trains a code generator and a unit-test generator without ground-truth code supervision. By deriving a theoretically grounded reward for unit tests and a co-evolution objective, CURE improves coding performance (ReasonFlux-Coder up to +5.3% in one-shot accuracy and BoN +9.0%), enhances unit-test generation, and enables test-time scaling and agentic coding. It also enables the unit tester to function as a reward model for RL on base models, and it achieves efficiency gains, notably for long-CoT setups, reducing unit-test generation length while maintaining or improving accuracy. The results, across five benchmarks and multiple pipelines, demonstrate the practical impact of self-supervised co-evolution for scalable, robust, and cost-efficient AI-assisted coding and testing.

Abstract

We propose CURE, a novel reinforcement learning framework with a dedicated reward design that co-evolves coding and unit test generation capabilities based on their interaction outcomes, without any ground-truth code as supervision. This approach enables flexible and scalable training and allows the unit tester to learn directly from the coder's mistakes. Our derived ReasonFlux-Coder-7B and 14B models improve code generation accuracy by 5.3% and Best-of-N accuracy by 9.0% after optimization on Qwen2.5-Instruct models, outperforming similarly sized Qwen-Coder, DeepSeek-Coder, and Seed-Coder. They naturally extend to downstream tasks such as test-time scaling and agentic coding-achieving a 8.1% improvement over the base model. For the long-CoT model, our ReasonFlux-Coder-4B consistently outperforms Qwen3-4B while achieving 64.8% inference efficiency in unit test generation. Notably, we also find that our model can serve as an effective reward model for reinforcement learning on base models. Project: https://github.com/Gen-Verse/CURE

Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning

TL;DR

Abstract

Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (4)