Learning to Plan for Language Modeling from Unlabeled Data
Nathan Cornille, Marie-Francine Moens, Florian Mai
TL;DR
This work tackles the limitation of purely next-token-based planning in large language models by introducing an external planner that learns to predict abstract writing actions from unlabeled data. It derives these actions from clustering sentence embeddings, and integrates planner-predicted actions into the language model through an adapter, enabling planning without task-specific supervision. Empirically, the approach yields perplexity improvements and stronger text-structure generation across GPT-2 and OLMo models, with external planners outperforming internal planning strategies. The proposed modular, self-supervised planning framework supports scalable development and sharing of planning capabilities across models, suggesting a path toward more coherent, structure-aware language generation at scale.
Abstract
By training to predict the next token in an unlabeled corpus, large language models learn to perform many tasks without any labeled data. However, their next-token-prediction objective arguably limits their performance in scenarios that require planning, such as writing a coherent article. In this paper, we train a module for planning the future writing process via a self-supervised learning objective. Given the textual context, this planning module learns to predict future abstract writing actions, which correspond to centroids in a clustered text embedding space. By conditioning on these actions, our model extends the successful language model formula to more abstract planning in an unsupervised way. Empirically, we demonstrate that our method improves language modeling performance in general, particularly with respect to the text structure. Because our framework uses a planner module that is unsupervised and external to the language model, new planner modules can be trained at large scale and easily be shared with the community.
