Table of Contents
Fetching ...

Just-in-time and distributed task representations in language models

Yuxuan Li, Declan Campbell, Stephanie C. Y. Chan, Andrew Kyle Lampinen

TL;DR

This work investigates how language models develop task representations in-context, distinguishing between continuous task identity signals and sporadically activating transferrable representations that can be injected to restore task context. Using linear decoders and a multi-token, patch-based extraction approach, the authors show that transferrable task representations accumulate evidence across in-context examples but activate only at key tokens, while task identity signals persist throughout the prompt. The study reveals distinct temporal and semantic locality patterns, including variant task scopes and cross-token transfer, and highlights model-dependent differences in convergence behavior. The findings have implications for mechanistic interpretability and practical prompt design, suggesting that effective control may require strategic, time- and scope-aware re-instantiation of task states. Overall, the paper provides a nuanced view of how in-context learning constructs and deploys task representations across tokens and subtasks in large language models.

Abstract

Many of language models' impressive capabilities originate from their in-context learning: based on instructions or examples, they can infer and perform new tasks without weight updates. In this work, we investigate when representations for new tasks are formed in language models, and how these representations change over the course of context. We study two different task representations: those that are ''transferrable'' -- vector representations that can transfer task contexts to another model instance, even without the full prompt -- and simpler representations of high-level task categories. We show that transferrable task representations evolve in non-monotonic and sporadic ways, while task identity representations persist throughout the context. Specifically, transferrable task representations exhibit a two-fold locality. They successfully condense evidence when more examples are provided in the context. But this evidence accrual process exhibits strong temporal locality along the sequence dimension, coming online only at certain tokens -- despite task identity being reliably decodable throughout the context. In some cases, transferrable task representations also show semantic locality, capturing a small task ''scope'' such as an independent subtask. Language models thus represent new tasks on the fly through both an inert, sustained sensitivity to the task and an active, just-in-time representation to support inference.

Just-in-time and distributed task representations in language models

TL;DR

This work investigates how language models develop task representations in-context, distinguishing between continuous task identity signals and sporadically activating transferrable representations that can be injected to restore task context. Using linear decoders and a multi-token, patch-based extraction approach, the authors show that transferrable task representations accumulate evidence across in-context examples but activate only at key tokens, while task identity signals persist throughout the prompt. The study reveals distinct temporal and semantic locality patterns, including variant task scopes and cross-token transfer, and highlights model-dependent differences in convergence behavior. The findings have implications for mechanistic interpretability and practical prompt design, suggesting that effective control may require strategic, time- and scope-aware re-instantiation of task states. Overall, the paper provides a nuanced view of how in-context learning constructs and deploys task representations across tokens and subtasks in large language models.

Abstract

Many of language models' impressive capabilities originate from their in-context learning: based on instructions or examples, they can infer and perform new tasks without weight updates. In this work, we investigate when representations for new tasks are formed in language models, and how these representations change over the course of context. We study two different task representations: those that are ''transferrable'' -- vector representations that can transfer task contexts to another model instance, even without the full prompt -- and simpler representations of high-level task categories. We show that transferrable task representations evolve in non-monotonic and sporadic ways, while task identity representations persist throughout the context. Specifically, transferrable task representations exhibit a two-fold locality. They successfully condense evidence when more examples are provided in the context. But this evidence accrual process exhibits strong temporal locality along the sequence dimension, coming online only at certain tokens -- despite task identity being reliably decodable throughout the context. In some cases, transferrable task representations also show semantic locality, capturing a small task ''scope'' such as an independent subtask. Language models thus represent new tasks on the fly through both an inert, sustained sensitivity to the task and an active, just-in-time representation to support inference.

Paper Structure

This paper contains 29 sections, 25 figures, 5 tables.

Figures (25)

  • Figure 1: Understanding how task representations develop over context. A. A schematic of extracting transferrable task representations and restoring task contexts (via patching) in zero-shot settings. The highlighted tokens indicate the source and target for extracting and injecting task representations. B. Transferrable task representations restore task accuracy on zero-shot prompts. Results are aggregated over all models for simple tasks (see Appendix \ref{['sec:all-tasks']}). Error bars indicate the 95% CI over tasks. C. An overview of the development of different task representations over context. Solid bars: recontextualized zero-shot accuracy for task vectors extracted from different tokens. Transparent bars: task identity decoding accuracy from different token representations.
  • Figure 2: Transferrable task representations activate sporadically at key tokens, but task identity representations persist throughout the context. A. Recontextualization accuracy when each token representation is used to restore task contexts in zero-shot settings. B. Task identity decoding accuracy (among 14 tasks) for token representations at different layers and positions. This figure plots aligned sequences across different samples and tasks; since exact positions differ depending on the sample, the indices shown in the labels are approximate. See results for other models in Figure \ref{['fig:gemma3_appendix_heatmap']}.
  • Figure 3: Sporadic & inconsistent evidence accrual in language models. A. Task vectors extracted from the last colon token in each example capture evidence accrual on most tasks (12 out of 14). However, on two "hard-to-transfer" tasks, task vectors do not capture this evidence accumulation, even though the models (behaviorally) do learn from more examples. The solid bars indicate recontextualized zero-shot accuracy (via task vectors), and light bars in the background indicate few-shot accuracy (without task vectors). The dotted lines indicate the ratio of the recontextualized zero-shot accuracy against few-shot accuracy. B. Most other format tokens in the context do not robustly form transferrable task representations that support recontextualization on zero-shot, but task identity is reliably decodable in their residual activations. Here, we report the task identity decoding accuracy at the mode best layer at which transferrable task representations form in the second ":" token. See the main text for more details.
  • Figure 4: Analyses of extracted task representations in Gemma V3 models. A. The extracted task vectors (at the last colon token) tend to decrease in both variance and magnitude with more examples, exhibiting a general tendency to condense evidence and converge onto stable task representations. The solid line shows the average across tasks. The transparent lines show the individual tasks. B. Extracted task vectors from the 27B model form distinct clusters. The numbers label the centroid for each task (see legend and results for other models in Figure \ref{['fig:tsne']}). Task vectors are similar but distinguishable when a task is evaluated independently vs. embedded within a larger task structure. For example, representations for antonym (0), antonym x 3 (14), and where antonym appears as a first task in a mixed-generation task chain (24&27) are close but distinct.
  • Figure 5: Reinstantiated task contexts in longer- and mixed-generation tasks often decay over generation, especially for tasks that can be decomposed into semantically-independent subtasks. This suggests a tendency for models to only activate transferrable representations for small task scopes. A. Bar plot: recontextualized zero-shot accuracy compared to zero-shot and 8-shot accuracy on longer-generation tasks; accuracies within each task are averaged across output units. Line plots: recontextualization accuracy for each output unit, conditioned on sequences where models generated full correct responses with eight examples in-context. An output unit usually corresponds to a single word and is occasionally a short phrase (e.g. the capital of a country). B. Visualization as in A, but for mixed-generation tasks.
  • ...and 20 more figures