Table of Contents
Fetching ...

Language Models Struggle to Use Representations Learned In-Context

Michael A. Lepori, Tal Linzen, Ann Yuan, Katja Filippova

TL;DR

The paper investigates whether LLMs can flexibly deploy in-context learned representations to solve downstream tasks. It uses two metrics, Dirichlet Energy $E_G(H^l(\mathcal{T}))$ and Distance Correlation $D_C(H^l(\mathcal{T}))$, to quantify how well token representations align with a latent state-space topology, and evaluates open-weight LLMs on next-token prediction and adaptive world modeling (AWM), as well as frontier reasoning models. Across tasks, open-weight models encode in-context semantics in their latent representations but largely fail to deploy them for next-token prediction or AWM; frontier models show partial improvements, suggesting reasoning chains can partially compensate but do not fully enable flexible deployment. The findings underscore the need for targeted training regimens or architectural changes to create truly adaptable agents capable of leveraging in-context information in flexible, context-shifting ways.

Abstract

Though large language models (LLMs) have enabled great success across a wide variety of tasks, they still appear to fall short of one of the loftier goals of artificial intelligence research: creating an artificial system that can adapt its behavior to radically new contexts upon deployment. One important step towards this goal is to create systems that can induce rich representations of data that are seen in-context, and then flexibly deploy these representations to accomplish goals. Recently, Park et al. (2024) demonstrated that current LLMs are indeed capable of inducing such representation from context (i.e., in-context representation learning). The present study investigates whether LLMs can use these representations to complete simple downstream tasks. We first assess whether open-weights LLMs can use in-context representations for next-token prediction, and then probe models using a novel task, adaptive world modeling. In both tasks, we find evidence that open-weights LLMs struggle to deploy representations of novel semantics that are defined in-context, even if they encode these semantics in their latent representations. Furthermore, we assess closed-source, state-of-the-art reasoning models on the adaptive world modeling task, demonstrating that even the most performant LLMs cannot reliably leverage novel patterns presented in-context. Overall, this work seeks to inspire novel methods for encouraging models to not only encode information presented in-context, but to do so in a manner that supports flexible deployment of this information.

Language Models Struggle to Use Representations Learned In-Context

TL;DR

The paper investigates whether LLMs can flexibly deploy in-context learned representations to solve downstream tasks. It uses two metrics, Dirichlet Energy and Distance Correlation , to quantify how well token representations align with a latent state-space topology, and evaluates open-weight LLMs on next-token prediction and adaptive world modeling (AWM), as well as frontier reasoning models. Across tasks, open-weight models encode in-context semantics in their latent representations but largely fail to deploy them for next-token prediction or AWM; frontier models show partial improvements, suggesting reasoning chains can partially compensate but do not fully enable flexible deployment. The findings underscore the need for targeted training regimens or architectural changes to create truly adaptable agents capable of leveraging in-context information in flexible, context-shifting ways.

Abstract

Though large language models (LLMs) have enabled great success across a wide variety of tasks, they still appear to fall short of one of the loftier goals of artificial intelligence research: creating an artificial system that can adapt its behavior to radically new contexts upon deployment. One important step towards this goal is to create systems that can induce rich representations of data that are seen in-context, and then flexibly deploy these representations to accomplish goals. Recently, Park et al. (2024) demonstrated that current LLMs are indeed capable of inducing such representation from context (i.e., in-context representation learning). The present study investigates whether LLMs can use these representations to complete simple downstream tasks. We first assess whether open-weights LLMs can use in-context representations for next-token prediction, and then probe models using a novel task, adaptive world modeling. In both tasks, we find evidence that open-weights LLMs struggle to deploy representations of novel semantics that are defined in-context, even if they encode these semantics in their latent representations. Furthermore, we assess closed-source, state-of-the-art reasoning models on the adaptive world modeling task, demonstrating that even the most performant LLMs cannot reliably leverage novel patterns presented in-context. Overall, this work seeks to inspire novel methods for encouraging models to not only encode information presented in-context, but to do so in a manner that supports flexible deployment of this information.
Paper Structure (30 sections, 3 equations, 20 figures, 2 tables)

This paper contains 30 sections, 3 equations, 20 figures, 2 tables.

Figures (20)

  • Figure 1: (Top Left) Example of an N-by-N state space topology used to generate a random walk. (Right) Examples of next token prediction prompts in the Instruction or Prefilled condition. Prompt formatting tokens are not bolded for readability. In the Instruction condition, models need to deploy in-context representations that are formed during the random walk (colored tokens) after an interval of several tokens in order to predict a valid next token. In the Prefilled Condition, in-context representations are deployed immediately. (Bottom Left) Example of an adaptive world modeling prompt, which consists of a random walk (colored tokens), followed by few-shot examples defining a rule that maps states at one position to states at another. Here the rule maps states $s_{i,j}$ to $s_{i+2, j}$ (a "two-step rule"). Here, city would get mapped to wing.
  • Figure 2: Example of in-context representation learning over a 5-by-5 grid topology.
  • Figure 3: Next-token prediction results for all open-weights models over all topologies. Models struggle when the random walk is present in the user prompt, rather than in a prefilled model response. This indicates that the representations learned in-context during the random walk/random adjacencies are not easily deployed when they need to be used at a later time.
  • Figure 4: Adaptive World Modeling results. Across all task configurations, various open-weights LMs (each represented by a separate dot) struggle, despite encoding the underlying grid topology in their latent representations. Note: A naive strategy of randomly selecting an attested adjacency would achieve approximately 50% (for one-dimensional) and 30% (for two-dimensional) accuracy for the "one-step" rules.
  • Figure 5: (Top) Open-weights LM results on few-shot learning when the state space topology is explicitly presented in the prompt. LMs achieve variable performance, but it is typically much higher than in the AWM setting. (Bottom) Relative Dirichlet Energy between tokens in the random walk and in the few-shot examples (and between random walk tokens and uncontextualized tokens; light bars) over two layers. In-context representations encode the underlying topology with lower fidelity in the few-shot examples.
  • ...and 15 more figures