Table of Contents
Fetching ...

Language Model Planning from an Information Theoretic Perspective

Muhammed Ustaomeroglu, Baris Askin, Gauri Joshi, Carlee Joe-Wong, Guannan Qu

TL;DR

This work investigates whether decoder-only language models engage in planning by analyzing internal transformer computations through an information-theoretic lens. It introduces a VQ-VAE–based pipeline to compress hidden states into discrete codes and measures mutual information between prefix computations and future decision states, enabling assessment of forward-looking, branching, and history dependence across synthetic CFG, path-finding, and natural-language tasks. Key findings show that planning horizons are task-dependent, models retain information about unused correct continuations, and predictions draw predominantly on recent computations with nontrivial influence from earlier blocks. The proposed framework provides a general, automated method to probe internal LM dynamics, with implications for interpretability and principled model design.

Abstract

The extent to which decoder-only language models (LMs) engage in planning, that is, organizing intermediate computations to support coherent long-range generation, remains an open and important question, with implications for interpretability, reliability, and principled model design. Planning involves structuring computations over long horizons, considering multiple possible continuations, and selectively reusing past information, but how effectively transformer-based LMs realize these capabilities is still unclear. We address these questions by analyzing the hidden states at the core of transformer computations, which capture intermediate results and act as carriers of information. Since these hidden representations are often redundant and encumbered with fine-grained details, we develop a pipeline based on vector-quantized variational autoencoders that compresses them into compact summary codes. These codes enable measuring mutual information, allowing systematic analysis of the computational structure underlying model behavior. Using this framework, we study planning in LMs across synthetic grammar, path-finding tasks, and natural language datasets, focusing on three key aspects: (i) the planning horizon of pre-output computations, (ii) the extent to which the model considers alternative valid continuations, and (iii) the reliance of new predictions on earlier computations. By answering these questions, we advance the understanding of how planning is realized in LMs and contribute a general-purpose pipeline for probing the internal dynamics of LMs and deep learning systems. Our results reveal that the effective planning horizon is task-dependent, that models implicitly preserve information about unused correct continuations, and that predictions draw most on recent computations, though earlier blocks remain informative.

Language Model Planning from an Information Theoretic Perspective

TL;DR

This work investigates whether decoder-only language models engage in planning by analyzing internal transformer computations through an information-theoretic lens. It introduces a VQ-VAE–based pipeline to compress hidden states into discrete codes and measures mutual information between prefix computations and future decision states, enabling assessment of forward-looking, branching, and history dependence across synthetic CFG, path-finding, and natural-language tasks. Key findings show that planning horizons are task-dependent, models retain information about unused correct continuations, and predictions draw predominantly on recent computations with nontrivial influence from earlier blocks. The proposed framework provides a general, automated method to probe internal LM dynamics, with implications for interpretability and principled model design.

Abstract

The extent to which decoder-only language models (LMs) engage in planning, that is, organizing intermediate computations to support coherent long-range generation, remains an open and important question, with implications for interpretability, reliability, and principled model design. Planning involves structuring computations over long horizons, considering multiple possible continuations, and selectively reusing past information, but how effectively transformer-based LMs realize these capabilities is still unclear. We address these questions by analyzing the hidden states at the core of transformer computations, which capture intermediate results and act as carriers of information. Since these hidden representations are often redundant and encumbered with fine-grained details, we develop a pipeline based on vector-quantized variational autoencoders that compresses them into compact summary codes. These codes enable measuring mutual information, allowing systematic analysis of the computational structure underlying model behavior. Using this framework, we study planning in LMs across synthetic grammar, path-finding tasks, and natural language datasets, focusing on three key aspects: (i) the planning horizon of pre-output computations, (ii) the extent to which the model considers alternative valid continuations, and (iii) the reliance of new predictions on earlier computations. By answering these questions, we advance the understanding of how planning is realized in LMs and contribute a general-purpose pipeline for probing the internal dynamics of LMs and deep learning systems. Our results reveal that the effective planning horizon is task-dependent, that models implicitly preserve information about unused correct continuations, and that predictions draw most on recent computations, though earlier blocks remain informative.

Paper Structure

This paper contains 51 sections, 21 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: The visualizations of experimental settings. Left: History in the plan experiment ($\text{§}$\ref{['sect:plan_history']}). Middle: Horizon of the plan and branches in the plan experiment ($\text{§}$\ref{['sect:plan_horizon']} & $\text{§}$\ref{['sect:plan_branch']}). The red, blue, and green colors indicate target variables used in our analysis in left and middle panels. Right: An illustration of a simplified sample in PF task. Graphs and token numbers are randomly generated except for the start (1) and goal node (2). A sample prompt is "19 23 , 13 2 , 11 8 , 4 17 , 1 23 , 9 7, 2 8, 17 1 , 13 2, 14 23, 23 7, 1 23 :" and correct responses are "1 11 8 2" or "1 17 13 2".
  • Figure 2: $\text{nMI}$ results between the prefix summary codes and the last hidden state codes of generated tokens for CFG (a) and PF (b) tasks. $\text{nMI}$ decays fast in CFG, showing short and local planning, while PF maintains or even increases $\text{nMI}$ beyond $\tau=1$, reflecting planning for later steps.
  • Figure 3: nMI across blocks and layers, and conditional nMI. Left: nMI between the hidden state block codes and the token decision state code at $\tau=0$, $\text{nMI}(\mathbf{B}_{k}^{\ell}; Z_T^{L})$. Middle: nMI between block codes and the last-layer decision code at $\tau=1$, $\text{nMI}(\mathbf{B}_{k}^{\ell}; Z_{T+1}^{L})$. In both heatmaps, nMI is higher for recent blocks (small $k$) and final layers (high $\ell$). Right: Conditional nMI for the 1$^{\text{st}}$ block, $\text{nMI}(Z^{\ell}_{T-15:T-1}; Z_T^{L}\mid Z_T^{\ell})$, showing that most of the dependence between the 1$^{\text{st}}$ block and the generated token at $\tau=0$ is attributable to the final prefix position $T$.
  • Figure 4: The comparison of our normalized latent mutual information estimations with different codebook sizes.
  • Figure 5: The codebook similarities for $\texttt{VQ-VAE}$ encoding $H$ in CFG task.
  • ...and 8 more figures