Table of Contents
Fetching ...

Non-myopic Generation of Language Models for Reasoning and Planning

Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong

TL;DR

This paper revisits LLM reasoning from an optimal-control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy and mitigate early errors and promote non-myopic planning.

Abstract

Large Language Models have demonstrated remarkable abilities in reasoning and planning by breaking down complex problems into sequential steps. Despite their success in various domains like mathematical problem-solving and coding, LLMs face challenges in ensuring reliable and optimal planning due to their inherent myopic nature of autoregressive decoding. This paper revisits LLM reasoning from an optimal-control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy. By re-weighting LLM distributions based on foresight trajectories, Predictive-Decoding aims to mitigate early errors and promote non-myopic planning. Our experiments show significant improvements in a wide range of tasks for math, coding, and agents. Furthermore, Predictive-Decoding demonstrates computational efficiency, outperforming search baselines with reduced computational resources. This study provides insights into optimizing LLM planning capabilities.

Non-myopic Generation of Language Models for Reasoning and Planning

TL;DR

This paper revisits LLM reasoning from an optimal-control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy and mitigate early errors and promote non-myopic planning.

Abstract

Large Language Models have demonstrated remarkable abilities in reasoning and planning by breaking down complex problems into sequential steps. Despite their success in various domains like mathematical problem-solving and coding, LLMs face challenges in ensuring reliable and optimal planning due to their inherent myopic nature of autoregressive decoding. This paper revisits LLM reasoning from an optimal-control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy. By re-weighting LLM distributions based on foresight trajectories, Predictive-Decoding aims to mitigate early errors and promote non-myopic planning. Our experiments show significant improvements in a wide range of tasks for math, coding, and agents. Furthermore, Predictive-Decoding demonstrates computational efficiency, outperforming search baselines with reduced computational resources. This study provides insights into optimizing LLM planning capabilities.

Paper Structure

This paper contains 38 sections, 3 theorems, 11 equations, 10 figures, 10 tables, 1 algorithm.

Key Result

Proposition 4.1

The distribution that solves the optimization problem in Eq.eq: solution is in the form of:

Figures (10)

  • Figure 1: The illustrative overview of Predictive-Decoding on one GSM8K example. LLM autoregressive planning often suffers from near sight. Predictive-Decoding rescales LLM generation distribution based on evaluation of foresight, enabling non-myopic planning.
  • Figure 2: Myopic Gap distribution for correct and wrong samples (drawn with kde-plot). Myopic examples are defined as $p^*>0.01$. Wrong samples show a higher myopic rate on both tasks.
  • Figure 3: In GSM8K, the first incorrect step's average score is among correct steps, but not after a few steps.
  • Figure 4: Performance v.s. Efficiency on GSM8K. Predictive Decoding is Pareto superior to Beam Search with longer foresight.
  • Figure 5: Illustrating Performance and Diversity tradeoff on HumanEval by controlling parameters $\tau$ and $\alpha$. Diversity uses 1- ROUGE score.
  • ...and 5 more figures

Theorems & Definitions (5)

  • Definition 3.1
  • Proposition 4.1
  • Proposition B.1
  • proof
  • Lemma B.2