Table of Contents
Fetching ...

ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation

Xue Jiang, Yihong Dong, Yongding Tao, Huanyu Liu, Zhi Jin, Wenpin Jiao, Ge Li

TL;DR

RoCode addresses error accumulation in auto-regressive code generation by integrating a backtracking mechanism with compiler-based program analysis during decoding. It incrementally detects errors, strategically rolls back to informative points, and regenerates code under constraints, all modeled with a Trie Tree to handle non-linear generation trajectories. Across six benchmarks, two languages, and nine LLMs, RoCode achieves high compilation success and substantial pass-rate gains, while reducing token costs relative to post-revising and showing robustness to the decay factor. The approach is model-agnostic and does not require additional training, offering practical gains for reliable, efficient code generation in diverse programming contexts.

Abstract

Large language models (LLMs) have achieved impressive performance in code generation recently, offering programmers revolutionary assistance in software development. However, due to the auto-regressive nature of LLMs, they are susceptible to error accumulation during code generation. Once an error is produced, LLMs can merely continue to generate the subsequent code conditioned on it, given their inability to adjust previous outputs. Existing LLM-based approaches typically consider post-revising after code generation, leading to the challenging resolution of accumulated errors and the significant wastage of resources. Ideally, LLMs should rollback and resolve the occurred error in time during code generation, rather than proceed on the basis of the error and wait for post-revising after generation. In this paper, we propose ROCODE, which integrates the backtracking mechanism and program analysis into LLMs for code generation. Specifically, we employ program analysis to perform incremental error detection during the generation process. When an error is detected, the backtracking mechanism is triggered to priming rollback strategies and constraint regeneration, thereby eliminating the error early and ensuring continued generation on the correct basis. Experiments on multiple code generation benchmarks show that ROCODE can significantly reduce the errors generated by LLMs, with a compilation pass rate of 99.1%. The test pass rate is improved by up to 23.8% compared to the best baseline approach. Compared to the post-revising baseline, the token cost is reduced by 19.3%. Moreover, our approach is model-agnostic and achieves consistent improvements across nine representative LLMs.

ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation

TL;DR

RoCode addresses error accumulation in auto-regressive code generation by integrating a backtracking mechanism with compiler-based program analysis during decoding. It incrementally detects errors, strategically rolls back to informative points, and regenerates code under constraints, all modeled with a Trie Tree to handle non-linear generation trajectories. Across six benchmarks, two languages, and nine LLMs, RoCode achieves high compilation success and substantial pass-rate gains, while reducing token costs relative to post-revising and showing robustness to the decay factor. The approach is model-agnostic and does not require additional training, offering practical gains for reliable, efficient code generation in diverse programming contexts.

Abstract

Large language models (LLMs) have achieved impressive performance in code generation recently, offering programmers revolutionary assistance in software development. However, due to the auto-regressive nature of LLMs, they are susceptible to error accumulation during code generation. Once an error is produced, LLMs can merely continue to generate the subsequent code conditioned on it, given their inability to adjust previous outputs. Existing LLM-based approaches typically consider post-revising after code generation, leading to the challenging resolution of accumulated errors and the significant wastage of resources. Ideally, LLMs should rollback and resolve the occurred error in time during code generation, rather than proceed on the basis of the error and wait for post-revising after generation. In this paper, we propose ROCODE, which integrates the backtracking mechanism and program analysis into LLMs for code generation. Specifically, we employ program analysis to perform incremental error detection during the generation process. When an error is detected, the backtracking mechanism is triggered to priming rollback strategies and constraint regeneration, thereby eliminating the error early and ensuring continued generation on the correct basis. Experiments on multiple code generation benchmarks show that ROCODE can significantly reduce the errors generated by LLMs, with a compilation pass rate of 99.1%. The test pass rate is improved by up to 23.8% compared to the best baseline approach. Compared to the post-revising baseline, the token cost is reduced by 19.3%. Moreover, our approach is model-agnostic and achieves consistent improvements across nine representative LLMs.

Paper Structure

This paper contains 21 sections, 14 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: Statistics on the types of errors in code generated by LLM. The statistics are conducted based on the results generated by CodeLlama-7B and CodeGen-6B on HumanEval and MBPP benchmarks using greedy decoding.
  • Figure 2: The Overview of RoCode with Trie Tree.
  • Figure 3: The performance of RoCode on different LLMs.
  • Figure 4: The performance of RoCode with different values of the hyperparameter $\lambda$. We use the gray dashed line to represent the employed hyper-parameters.
  • Figure 5: An example of RoCode.
  • ...and 1 more figures