Table of Contents
Fetching ...

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation

Houxing Ren, Mingjie Zhan, Zhongyuan Wu, Aojun Zhou, Junting Pan, Hongsheng Li

TL;DR

ReflectionCoder introduces reflection sequences, derived from compiler feedback, as training data to enhance one-off code generation. By combining reflection self-distillation and dynamically masked distillation, the method distills knowledge from iterative reflections into a single-shot generation, achieving state-of-the-art results on HumanEval+ and MBPP+ while generalizing to multiple languages and even non-code reasoning. The approach is validated with extensive ablations, data-source analyses, and autonomous enhancement experiments, showing robustness to masking strategies and data sources. While effective, the method relies on high-quality reflection data from powerful models and embedding schemes, highlighting practical considerations for deployment and potential avenues for applying the idea beyond code to long-reasoning tasks. Overall, ReflectionCoder advances the use of reflection-driven supervision to improve final-output quality in complex generation tasks with long reasoning paths.

Abstract

Code generation plays a crucial role in various tasks, such as code auto-completion and mathematical reasoning. Previous work has proposed numerous methods to enhance code generation performance, including integrating feedback from the compiler. Inspired by this, we present ReflectionCoder, a novel approach that effectively leverages reflection sequences constructed by integrating compiler feedback to improve one-off code generation performance. Furthermore, we propose reflection self-distillation and dynamically masked distillation to effectively utilize these reflection sequences. Extensive experiments on three benchmarks, i.e., HumanEval (+), MBPP (+), and MultiPL-E, demonstrate that models fine-tuned with our method achieve state-of-the-art performance. Beyond the code domain, we believe this approach can benefit other domains that focus on final results and require long reasoning paths. Code and data are available at https://github.com/SenseLLM/ReflectionCoder.

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation

TL;DR

ReflectionCoder introduces reflection sequences, derived from compiler feedback, as training data to enhance one-off code generation. By combining reflection self-distillation and dynamically masked distillation, the method distills knowledge from iterative reflections into a single-shot generation, achieving state-of-the-art results on HumanEval+ and MBPP+ while generalizing to multiple languages and even non-code reasoning. The approach is validated with extensive ablations, data-source analyses, and autonomous enhancement experiments, showing robustness to masking strategies and data sources. While effective, the method relies on high-quality reflection data from powerful models and embedding schemes, highlighting practical considerations for deployment and potential avenues for applying the idea beyond code to long-reasoning tasks. Overall, ReflectionCoder advances the use of reflection-driven supervision to improve final-output quality in complex generation tasks with long reasoning paths.

Abstract

Code generation plays a crucial role in various tasks, such as code auto-completion and mathematical reasoning. Previous work has proposed numerous methods to enhance code generation performance, including integrating feedback from the compiler. Inspired by this, we present ReflectionCoder, a novel approach that effectively leverages reflection sequences constructed by integrating compiler feedback to improve one-off code generation performance. Furthermore, we propose reflection self-distillation and dynamically masked distillation to effectively utilize these reflection sequences. Extensive experiments on three benchmarks, i.e., HumanEval (+), MBPP (+), and MultiPL-E, demonstrate that models fine-tuned with our method achieve state-of-the-art performance. Beyond the code domain, we believe this approach can benefit other domains that focus on final results and require long reasoning paths. Code and data are available at https://github.com/SenseLLM/ReflectionCoder.
Paper Structure (38 sections, 2 equations, 6 figures, 12 tables)

This paper contains 38 sections, 2 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: A sample of reflection sequence data containing four components: Reflection Instruction, Reflection Sequences, Instruction, and Final code.
  • Figure 2: Overview of the proposed dynamically masked distillation.
  • Figure 3: Overview of the proposed dynamic masking strategies. Here, a cell denotes a block, 'C' denotes the code block, 'E' denotes the execution block, and 'A' denotes the analysis block.
  • Figure 4: Effect of the factor of up-sample. The metric is Pass@1 accuracy, and all the results are based on Code Llama 7B.
  • Figure 5: The changes in masked rate during training.
  • ...and 1 more figures