Table of Contents
Fetching ...

Execution Guided Line-by-Line Code Generation

Boaz Lavon, Shahar Katz, Lior Wolf

TL;DR

EG-CFG addresses the gap between static code generation and runtime correctness by injecting dynamic execution signals into inference. It linearly samples multiple candidate continuations per line, extracts executable traces via AST parsing and test execution, and conditions token generation on these traces through Classifier-Free Guidance, enabling coherent, executable outputs. The approach enables native parallelism with multiple agents exploring diverse paths, achieving state-of-the-art accuracy across MBPP, MBPP-ET, HumanEval, HumanEval-ET, DS-1000, and CodeContests using open-source models. Notably, strong performance is attained even with smaller models, illustrating the practical impact of runtime grounding on code quality. This work suggests broad applicability to execution-grounded generation and motivates extensions to longer-horizon planning and multi-file or formal verification tasks.

Abstract

We present a novel approach to neural code generation that incorporates real-time execution signals into the language model generation process. While large language models (LLMs) have demonstrated impressive code generation capabilities, they typically do not utilize execution feedback during inference, a critical signal that human programmers regularly leverage. Our method, Execution-Guided Classifier-Free Guidance (EG-CFG), dynamically incorporates execution signals as the model generates code, providing line-by-line feedback that guides the generation process toward executable solutions. EG-CFG employs a multi-stage process: first, we conduct beam search to sample candidate program completions for each line; second, we extract execution signals by executing these candidates against test cases; and finally, we incorporate these signals into the prompt during generation. By maintaining consistent signals across tokens within the same line and refreshing signals at line boundaries, our approach provides coherent guidance while preserving syntactic structure. Moreover, the method naturally supports native parallelism at the task level in which multiple agents operate in parallel, exploring diverse reasoning paths and collectively generating a broad set of candidate solutions. Our experiments across diverse coding tasks demonstrate that EG-CFG significantly improves code generation performance compared to standard approaches, achieving state-of-the-art results across various levels of complexity, from foundational problems to challenging competitive programming and data science tasks. Our code is available at: https://github.com/boazlavon/eg_cfg

Execution Guided Line-by-Line Code Generation

TL;DR

EG-CFG addresses the gap between static code generation and runtime correctness by injecting dynamic execution signals into inference. It linearly samples multiple candidate continuations per line, extracts executable traces via AST parsing and test execution, and conditions token generation on these traces through Classifier-Free Guidance, enabling coherent, executable outputs. The approach enables native parallelism with multiple agents exploring diverse paths, achieving state-of-the-art accuracy across MBPP, MBPP-ET, HumanEval, HumanEval-ET, DS-1000, and CodeContests using open-source models. Notably, strong performance is attained even with smaller models, illustrating the practical impact of runtime grounding on code quality. This work suggests broad applicability to execution-grounded generation and motivates extensions to longer-horizon planning and multi-file or formal verification tasks.

Abstract

We present a novel approach to neural code generation that incorporates real-time execution signals into the language model generation process. While large language models (LLMs) have demonstrated impressive code generation capabilities, they typically do not utilize execution feedback during inference, a critical signal that human programmers regularly leverage. Our method, Execution-Guided Classifier-Free Guidance (EG-CFG), dynamically incorporates execution signals as the model generates code, providing line-by-line feedback that guides the generation process toward executable solutions. EG-CFG employs a multi-stage process: first, we conduct beam search to sample candidate program completions for each line; second, we extract execution signals by executing these candidates against test cases; and finally, we incorporate these signals into the prompt during generation. By maintaining consistent signals across tokens within the same line and refreshing signals at line boundaries, our approach provides coherent guidance while preserving syntactic structure. Moreover, the method naturally supports native parallelism at the task level in which multiple agents operate in parallel, exploring diverse reasoning paths and collectively generating a broad set of candidate solutions. Our experiments across diverse coding tasks demonstrate that EG-CFG significantly improves code generation performance compared to standard approaches, achieving state-of-the-art results across various levels of complexity, from foundational problems to challenging competitive programming and data science tasks. Our code is available at: https://github.com/boazlavon/eg_cfg

Paper Structure

This paper contains 24 sections, 14 equations, 7 figures, 6 tables, 2 algorithms.

Figures (7)

  • Figure 1: MBPP & MBPP-ET performance. EG-CFG (DeepSeek-V3) sets a new state-of-the-art results.
  • Figure 2: HumanEval & HumanEval-ET performance. EG-CFG (DeepSeek-V3) matches the state-of-the-art on HumanEval and sets a new state-of-the-art on HumanEval-ET.
  • Figure 3: DS-1000 performance. EG-CFG (DeepSeek-V3) achieves new state-of-the-art, surpassing GPT-4.
  • Figure 4: CodeContests performance. EG-CFG (DeepSeek-V3) sets a new state-of-the-art, outperforming GPT-4 and GPT-4o methods.
  • Figure 5: The DeepSeek-Instruct prompt used for MBPP Task 395. This prompt includes multiple solved examples followed by the target task.
  • ...and 2 more figures