Self-Infilling Code Generation

Lin Zheng; Jianbo Yuan; Zhi Zhang; Hongxia Yang; Lingpeng Kong

Self-Infilling Code Generation

Lin Zheng, Jianbo Yuan, Zhi Zhang, Hongxia Yang, Lingpeng Kong

TL;DR

Self-Infilling Code Generation introduces a decoding-time framework that leverages fill-in-the-middle (FIM) training to enable self-infilling in code generation. By combining an interruption mechanism that generates a suffix first with a looping process that alternates between self-infilling and left-to-right decoding, the approach achieves non-monotonic, more structured code synthesis. Empirical results across HumanEval, MBPP, DS-1000, multilingual benchmarks, and GSM8K demonstrate improved code regularity and quality over standard left-to-right decoding, with several cases matching or exceeding specialized training. This work highlights decoding-time infilling as a scalable method to enhance code synthesis and suggests potential applications beyond code in domains requiring bidirectional context.

Abstract

This work introduces self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding. Our approach capitalizes on the observation that recent infilling-capable code language models can self-infill: whereas infilling operations aim to fill in the middle based on a predefined prefix and suffix, self-infilling sequentially generates both such surrounding context and the infilled content. We utilize this capability to introduce novel interruption and looping mechanisms in conventional decoding, evolving it into a non-monotonic process. Interruptions allow for postponing the generation of specific code until a definitive suffix is established, enhancing control over the output. Meanwhile, the looping mechanism, which leverages the complementary nature of self-infilling and left-to-right decoding, can iteratively update and synchronize each piece of generation cyclically. Extensive experiments are conducted to demonstrate that our proposed decoding process is effective in enhancing both regularity and quality across several code generation benchmarks.

Self-Infilling Code Generation

TL;DR

Abstract

Paper Structure (48 sections, 6 equations, 22 figures, 11 tables, 3 algorithms)

This paper contains 48 sections, 6 equations, 22 figures, 11 tables, 3 algorithms.

Introduction
Self-infilling Code Generation
FIM Training Entails Self-infilling
Fill-in-the-middle (FIM) Training.
FIM Entails Self-infilling.
Self-infilling Interruption
Suffix Prompting.
Decoding through a Looping Mechanism
Experiments
Experimental Setup
Benchmarks.
Code Language Models.
Evaluation Protocols.
Results
Results on HumanEval and MBPP.
...and 33 more sections

Figures (22)

Figure 1: Schematic illustrations of various decoding approaches for code generation. (a) and (b) represent standard left-to-right decoding and infilling operations, respectively. Whereas infilling requires the user-provided prefix and suffix, self-infilling interruption (c) autonomously generates these segments. (d) further expands on self-infilling by incorporating a looping mechanism.
Figure 2: The distribution of degenerate solutions from self-infilling ($N\!=\!2$) versus vanilla decoding on HumanEval across various models. For each problem, 200 samples are generated using nucleus sampling with the temperature 0.8 and top-$p$ 0.95.
Figure 3: Proportional distribution of changes after a second iteration of the looping mechanism ($N\!=\!2$) on HumanEval and MBPP benchmarks with Code Llama 13B. Categories illustrate the state changes of generated code: 'Unchanged' denotes no change during the second time of looping, 'Changed but Remained Correct/Incorrect' for changed snippets that stayed correct/incorrect, and 'Correct $\rightarrow$ Incorrect' for snippets that changed from being correct to incorrect (vice versa).
Figure 4: Results on HumanEval and MBPP with different values of $\tau$ and $N$ on Code Llama 7B. $N=0$ indicates that the looping mechanism is disabled, and the horizontal dashed line represents the performance of the vanilla left-to-right baseline (L2R).
Figure 5: Python pseudo-code implementation of the parsing function for the left-to-right generation.
...and 17 more figures

Self-Infilling Code Generation

TL;DR

Abstract

Self-Infilling Code Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (22)