Table of Contents
Fetching ...

LSR-MCTS: Alleviating Long Range Dependency in Code Generation

Tingwei Lu, Yangning Li, Liyuan Wang, Binghuai Lin, Jiwei Tang, Qingsong Lv, Wanshi Xu, Hai-Tao Zheng, Yinghui Li, Xin Su, Zifei Shan

TL;DR

This work tackles long-range dependencies in code generation by switching from token-level to line-level processing and applying Monte Carlo Tree Search (MCTS) to optimize code line-by-line. It introduces LSR-MCTS, where each MCTS node comprises a context, a line, and a supplement, and uses a self-refine mechanism at each node to expand the search space and rectify errors. The approach relies on public test cases for scoring and backpropagation, pushing toward globally coherent code blocks. Experimental results on HumanEval, MBPP, and Code Contests show state-of-the-art performance across multiple code LLMs, with ablations underscoring the importance of line-level structure and per-node refinement. Limitations include runtime cost from repeated LLM calls and reliance on test-case availability, suggesting future work on efficiency and automatic test generation.

Abstract

The emergence of large language models (LLMs) has significantly promoted the development of code generation task, sparking a surge in pertinent literature. Current research is hindered by redundant generation results and a tendency to overfit local patterns in the short term. Although existing studies attempt to alleviate the issue by adopting a multi-token prediction strategy, there remains limited focus on choosing the appropriate processing length for generations. By analyzing the attention between tokens during the generation process of LLMs, it can be observed that the high spikes of the attention scores typically appear at the end of lines. This insight suggests that it is reasonable to treat each line of code as a fundamental processing unit and generate them sequentially. Inspired by this, we propose the \textbf{LSR-MCTS} algorithm, which leverages MCTS to determine the code line-by-line and select the optimal path. Further, we integrate a self-refine mechanism at each node to enhance diversity and generate higher-quality programs through error correction. Extensive experiments and comprehensive analyses on three public coding benchmarks demonstrate that our method outperforms the state-of-the-art performance approaches.

LSR-MCTS: Alleviating Long Range Dependency in Code Generation

TL;DR

This work tackles long-range dependencies in code generation by switching from token-level to line-level processing and applying Monte Carlo Tree Search (MCTS) to optimize code line-by-line. It introduces LSR-MCTS, where each MCTS node comprises a context, a line, and a supplement, and uses a self-refine mechanism at each node to expand the search space and rectify errors. The approach relies on public test cases for scoring and backpropagation, pushing toward globally coherent code blocks. Experimental results on HumanEval, MBPP, and Code Contests show state-of-the-art performance across multiple code LLMs, with ablations underscoring the importance of line-level structure and per-node refinement. Limitations include runtime cost from repeated LLM calls and reliance on test-case availability, suggesting future work on efficiency and automatic test generation.

Abstract

The emergence of large language models (LLMs) has significantly promoted the development of code generation task, sparking a surge in pertinent literature. Current research is hindered by redundant generation results and a tendency to overfit local patterns in the short term. Although existing studies attempt to alleviate the issue by adopting a multi-token prediction strategy, there remains limited focus on choosing the appropriate processing length for generations. By analyzing the attention between tokens during the generation process of LLMs, it can be observed that the high spikes of the attention scores typically appear at the end of lines. This insight suggests that it is reasonable to treat each line of code as a fundamental processing unit and generate them sequentially. Inspired by this, we propose the \textbf{LSR-MCTS} algorithm, which leverages MCTS to determine the code line-by-line and select the optimal path. Further, we integrate a self-refine mechanism at each node to enhance diversity and generate higher-quality programs through error correction. Extensive experiments and comprehensive analyses on three public coding benchmarks demonstrate that our method outperforms the state-of-the-art performance approaches.

Paper Structure

This paper contains 20 sections, 4 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Examples of code generated by two kinds of methods. The token-by-token approach misunderstands the NL description (highlighted in red), leading to generating a verbose program that passes only part of the test cases.
  • Figure 2: (a) A data case including NL description and code block, is marked with line numbers on the left. (b) Global attention heatmap, where the range of each line is specifically annotated. The columnar appears at the end of each line. (c) Local attention maps for the yellow snippets of the code block, with each corresponding token labeled below the graph, and the line-end token '\\ n' (in green) is particularly noticeable as a bar chart.
  • Figure 3: The framework of LSR-MCTS. The red part in (a) shows the four iterative steps of LSR-MCTS: selection, expansion, evaluation, and backpropagation. The green sections reflect the self-refine process, where new nodes are generated in the expansion step, and a higher-quality refined node is constructed in conjunction with LLMs and the new nodes. Part (b) more explicitly displays the content of a single node, including context, line, and supplement, with the main body "line" emphasized in bold blue. However, the first two codes are incorrect; through self-refine, they can be adjusted to the correct program.
  • Figure 4: The impact of hyperparameter variations on GPT-4 performance. Hyperparameters include the number of max rollouts $n$, the UCT parameter $c$, and the maximum number of child nodes $m$ in the tree.
  • Figure 5: Performance of various self-refine methods on CodeLlama-7B-Instruct and GPT-4.