ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation
Houxing Ren, Mingjie Zhan, Zhongyuan Wu, Aojun Zhou, Junting Pan, Hongsheng Li
TL;DR
ReflectionCoder introduces reflection sequences, derived from compiler feedback, as training data to enhance one-off code generation. By combining reflection self-distillation and dynamically masked distillation, the method distills knowledge from iterative reflections into a single-shot generation, achieving state-of-the-art results on HumanEval+ and MBPP+ while generalizing to multiple languages and even non-code reasoning. The approach is validated with extensive ablations, data-source analyses, and autonomous enhancement experiments, showing robustness to masking strategies and data sources. While effective, the method relies on high-quality reflection data from powerful models and embedding schemes, highlighting practical considerations for deployment and potential avenues for applying the idea beyond code to long-reasoning tasks. Overall, ReflectionCoder advances the use of reflection-driven supervision to improve final-output quality in complex generation tasks with long reasoning paths.
Abstract
Code generation plays a crucial role in various tasks, such as code auto-completion and mathematical reasoning. Previous work has proposed numerous methods to enhance code generation performance, including integrating feedback from the compiler. Inspired by this, we present ReflectionCoder, a novel approach that effectively leverages reflection sequences constructed by integrating compiler feedback to improve one-off code generation performance. Furthermore, we propose reflection self-distillation and dynamically masked distillation to effectively utilize these reflection sequences. Extensive experiments on three benchmarks, i.e., HumanEval (+), MBPP (+), and MultiPL-E, demonstrate that models fine-tuned with our method achieve state-of-the-art performance. Beyond the code domain, we believe this approach can benefit other domains that focus on final results and require long reasoning paths. Code and data are available at https://github.com/SenseLLM/ReflectionCoder.
