RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning

Shaopeng Fu; Xingxing Zhang; Li Dong; Di Wang; Furu Wei

RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning

Shaopeng Fu, Xingxing Zhang, Li Dong, Di Wang, Furu Wei

Abstract

While large language models (LLMs) have demonstrated strong performance on complex reasoning tasks such as competitive programming (CP), existing methods predominantly focus on single-attempt settings, overlooking their capacity for iterative refinement. In this paper, we present RefineRL, a novel approach designed to unleash the self-refinement capabilities of LLMs for CP problem solving. RefineRL introduces two key innovations: (1) Skeptical-Agent, an iterative self-refinement agent equipped with local execution tools to validate generated solutions against public test cases of CP problems. This agent always maintains a skeptical attitude towards its own outputs and thereby enforces rigorous self-refinement even when validation suggests correctness. (2) A reinforcement learning (RL) solution to incentivize LLMs to self-refine with only standard RLVR data (i.e., problems paired with their verifiable answers). Extensive experiments on Qwen3-4B and Qwen3-4B-2507 demonstrate that our method yields substantial gains: after our RL training, these compact 4B models integrated with the Skeptical-Agent not only outperform much larger 32B models but also approach the single-attempt performance of 235B models. These findings suggest that self-refinement holds considerable promise for scaling LLM reasoning, with significant potential for further advancement.

RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning

Abstract

RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning

Abstract

Paper Structure

Table of Contents

Figures (7)