Planning-Aware Code Infilling via Horizon-Length Prediction

Yifeng Ding; Hantian Ding; Shiqi Wang; Qing Sun; Varun Kumar; Zijian Wang

Planning-Aware Code Infilling via Horizon-Length Prediction

Yifeng Ding, Hantian Ding, Shiqi Wang, Qing Sun, Varun Kumar, Zijian Wang

TL;DR

This paper tackles Fill-in-the-Middle (FIM) in code generation, identifying that standard next-token prediction ($L_{NTP}$) lacks long-horizon planning to coherently connect middle infills to right-context suffix. It introduces Horizon-Length Prediction (HLP), an auxiliary objective predicting the remaining number of tokens in the middle segment at each generation step, implemented via an $hlp\_head$ and optimized with $L_{HLP}$ alongside $L_{NTP}$. Empirically, HLP yields up to 24% relative gains on repository-scale FIM benchmarks and up to 5% gains on syntax-aware FIM, while also improving code repair and reasoning tasks (up to 18% and 6% respectively) across multiple model families, with negligible training overhead and no inference cost. Analyses show that horizon awareness is not emergent from NTP and that HLP fosters lookahead planning, demonstrated by attention shifts toward suffix context and a reduction in planning failures. The work suggests broad applicability of horizon-aware training to improve long-horizon reasoning in code and possibly natural language domains, with future work extending to larger models and other generation tasks.

Abstract

Fill-in-the-Middle (FIM), or infilling, has become integral to code language models, enabling generation of missing code given both left and right contexts. However, the current FIM training paradigm which performs next-token prediction (NTP) over reordered sequence often leads to models struggling to generate content that aligns well with the surrounding context. We hypothesize that NTP alone is insufficient for models to learn effective planning conditioned on the distant right context, a critical factor for successful code infilling. To overcome this, we propose Horizon-Length Prediction (HLP), a novel training objective that teaches models to predict the number of remaining middle tokens at each step. HLP advances FIM with lookahead planning, enabling models to inherently learn infilling boundaries for arbitrary left and right contexts without relying on dataset-specific post-processing. Our evaluation across different model families and sizes shows that HLP significantly improves FIM performance by up to 24% relatively on diverse benchmarks, across file-level and repository-level. Furthermore, the enhanced planning capability gained through HLP boosts model performance on code reasoning. Importantly, HLP incurs negligible training overhead and no additional inference cost, ensuring its practicality for real-world scenarios.

Planning-Aware Code Infilling via Horizon-Length Prediction

TL;DR

This paper tackles Fill-in-the-Middle (FIM) in code generation, identifying that standard next-token prediction (

) lacks long-horizon planning to coherently connect middle infills to right-context suffix. It introduces Horizon-Length Prediction (HLP), an auxiliary objective predicting the remaining number of tokens in the middle segment at each generation step, implemented via an

and optimized with

alongside

. Empirically, HLP yields up to 24% relative gains on repository-scale FIM benchmarks and up to 5% gains on syntax-aware FIM, while also improving code repair and reasoning tasks (up to 18% and 6% respectively) across multiple model families, with negligible training overhead and no inference cost. Analyses show that horizon awareness is not emergent from NTP and that HLP fosters lookahead planning, demonstrated by attention shifts toward suffix context and a reduction in planning failures. The work suggests broad applicability of horizon-aware training to improve long-horizon reasoning in code and possibly natural language domains, with future work extending to larger models and other generation tasks.

Abstract

Paper Structure (25 sections, 5 equations, 4 figures, 14 tables)

This paper contains 25 sections, 5 equations, 4 figures, 14 tables.

Introduction
Post-processing for Fill-in-the-Middle
Post-processing Requires Task-Specific Knowledge
LLMs Fail to Plan Coherent Completions
FIM Requires Planning Capability
Horizon-Length Prediction
Experiments
Syntax-Aware Multilingual Code FIM
Repository-Level Cross-File Code FIM
Code Repair via Fill-in-the-Middle
Code Reasoning via Fill-in-the-Middle
Discussion
NTP Alone Cannot Yield Horizon Awareness
Why Horizon-Length Prediction works?
HLP Mitigates Planning Failures in FIM
...and 10 more sections

Figures (4)

Figure 1: Successful FIM requires planning capabilities. Given prefix and suffix, the model is asked to infill the middle part. Compared with the ground truth, LLM fails to connect to $\mathsf{suffix}$ due to lack of planning capability: the last part of the generation needs to connect with the member function Recognizer().
Figure 2: Overview of Horizon-Length Prediction (HLP). In this example, we set the length of $\mathsf{middle}$ to five tokens. Following the flow of arrows, we illustrate how the second token of $\mathsf{middle}$ (i.e., "$x_2$") is processed through both next-token prediction objective and horizon-length prediction objective.
Figure 3: Predicted percentage of remaining future tokens (as defined in Eq. (\ref{['formula:ftc_label']})) from models trained w/o and w/ HLP at different token positions, where the position of each token is normalized to the corresponding percentage over the sequence.
Figure 4: Attention analysis of DeepSeek-Coder-Base 1.3B on SAFIM, showing the ratio of attention paid to $\mathsf{suffix}$ between models trained with and without HLP. The X-axis shows the normalized position in the sequence, and the Y-axis shows the attention ratio. Values above 1 indicate that the HLP model pays more attention to $\mathsf{suffix}$ than the baseline model. We observe that the model trained with HLP generally pays more attention to $\mathsf{suffix}$, especially at the beginning, demonstrating its lookahead planning behavior.

Planning-Aware Code Infilling via Horizon-Length Prediction

TL;DR

Abstract

Planning-Aware Code Infilling via Horizon-Length Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (4)