Just-in-time and distributed task representations in language models
Yuxuan Li, Declan Campbell, Stephanie C. Y. Chan, Andrew Kyle Lampinen
TL;DR
This work investigates how language models develop task representations in-context, distinguishing between continuous task identity signals and sporadically activating transferrable representations that can be injected to restore task context. Using linear decoders and a multi-token, patch-based extraction approach, the authors show that transferrable task representations accumulate evidence across in-context examples but activate only at key tokens, while task identity signals persist throughout the prompt. The study reveals distinct temporal and semantic locality patterns, including variant task scopes and cross-token transfer, and highlights model-dependent differences in convergence behavior. The findings have implications for mechanistic interpretability and practical prompt design, suggesting that effective control may require strategic, time- and scope-aware re-instantiation of task states. Overall, the paper provides a nuanced view of how in-context learning constructs and deploys task representations across tokens and subtasks in large language models.
Abstract
Many of language models' impressive capabilities originate from their in-context learning: based on instructions or examples, they can infer and perform new tasks without weight updates. In this work, we investigate when representations for new tasks are formed in language models, and how these representations change over the course of context. We study two different task representations: those that are ''transferrable'' -- vector representations that can transfer task contexts to another model instance, even without the full prompt -- and simpler representations of high-level task categories. We show that transferrable task representations evolve in non-monotonic and sporadic ways, while task identity representations persist throughout the context. Specifically, transferrable task representations exhibit a two-fold locality. They successfully condense evidence when more examples are provided in the context. But this evidence accrual process exhibits strong temporal locality along the sequence dimension, coming online only at certain tokens -- despite task identity being reliably decodable throughout the context. In some cases, transferrable task representations also show semantic locality, capturing a small task ''scope'' such as an independent subtask. Language models thus represent new tasks on the fly through both an inert, sustained sensitivity to the task and an active, just-in-time representation to support inference.
