Learning to Plan Long-Term for Language Modeling

Florian Mai; Nathan Cornille; Marie-Francine Moens

Learning to Plan Long-Term for Language Modeling

Florian Mai, Nathan Cornille, Marie-Francine Moens

TL;DR

By sampling multiple plans at once, this paper condition the language model on an accurate approximation of the distribution of text continuations, which leads to better next token prediction accuracy.

Abstract

Modern language models predict the next token in the sequence by considering the past text through a powerful function such as attention. However, language models have no explicit mechanism that allows them to spend computation time for planning long-distance future text, leading to a suboptimal token prediction. In this paper, we propose a planner that predicts a latent plan for many sentences into the future. By sampling multiple plans at once, we condition the language model on an accurate approximation of the distribution of text continuations, which leads to better next token prediction accuracy. In effect, this allows trading computation time for prediction accuracy.

Learning to Plan Long-Term for Language Modeling

TL;DR

Abstract

Paper Structure (24 sections, 6 equations, 3 figures, 3 tables)

This paper contains 24 sections, 6 equations, 3 figures, 3 tables.

Introduction
Related Work
Predictive coding
Additional inference-time compute
Methods
Training an External Planner
Planning Multiple Steps Ahead
Multi-path Adapter
Experiments
Baselines and metrics
Hyperparameters
Results
Impact of multi-step predictions
Impact of conditioning on multiple paths
Ablations
...and 9 more sections

Figures (3)

Figure 1: Overview of our method.
Figure 2: Performance and relative generation time as a function of the number of samples $K$ drawn.
Figure 3: Perplexity on the validation set depending on the sampling temperature $\tau$. Since the textual context in the evaluation on the validation set is shorter, reported perplexities are larger than on the test set.

Learning to Plan Long-Term for Language Modeling

TL;DR

Abstract

Learning to Plan Long-Term for Language Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (3)