Table of Contents
Fetching ...

Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models

Ningyu Xu, Qi Zhang, Xipeng Qiu, Xuanjing Huang

TL;DR

This work investigates the internal processing of LLMs during in-context concept inference, revealing a conceptual subspace emerging in middle to late layers, whose representational structure persists across contexts.

Abstract

Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning. While recent work has identified structured, human-like conceptual representations within these models, it remains unclear whether they functionally rely on such representations for reasoning. Here we investigate the internal processing of LLMs during in-context concept inference. Our results reveal a conceptual subspace emerging in middle to late layers, whose representational structure persists across contexts. Using causal mediation analyses, we demonstrate that this subspace is not merely an epiphenomenon but is functionally central to model predictions, establishing its causal role in inference. We further identify a layer-wise progression where attention heads in early-to-middle layers integrate contextual cues to construct and refine the subspace, which is subsequently leveraged by later layers to generate predictions. Together, these findings provide evidence that LLMs dynamically construct and use structured, latent representations in context for inference, offering insights into the computational processes underlying flexible adaptation.

Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models

TL;DR

This work investigates the internal processing of LLMs during in-context concept inference, revealing a conceptual subspace emerging in middle to late layers, whose representational structure persists across contexts.

Abstract

Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning. While recent work has identified structured, human-like conceptual representations within these models, it remains unclear whether they functionally rely on such representations for reasoning. Here we investigate the internal processing of LLMs during in-context concept inference. Our results reveal a conceptual subspace emerging in middle to late layers, whose representational structure persists across contexts. Using causal mediation analyses, we demonstrate that this subspace is not merely an epiphenomenon but is functionally central to model predictions, establishing its causal role in inference. We further identify a layer-wise progression where attention heads in early-to-middle layers integrate contextual cues to construct and refine the subspace, which is subsequently leveraged by later layers to generate predictions. Together, these findings provide evidence that LLMs dynamically construct and use structured, latent representations in context for inference, offering insights into the computational processes underlying flexible adaptation.
Paper Structure (31 sections, 8 equations, 16 figures, 1 table)

This paper contains 31 sections, 8 equations, 16 figures, 1 table.

Figures (16)

  • Figure 1: Illustration of how an emergent conceptual subspace supports in-context inference in LLMs. A Transformer-based LLM is presented with a small set of description--word demonstrations followed by a query description. As the model processes the prompt across layers, it integrates contextual cues to construct a shared conceptual subspace that emerges in the middle-to-late layers. Hidden states can be projected into this subspace, where the relational structure among representations persists across layers and across different demonstration contexts. Causal interventions show that the subspace and its internal relational structure are functionally involved in inference and can constrain subsequent computations leading to the final prediction.
  • Figure 2: A shared conceptual subspace emerges in the middle to late layers of Llama-3.1 70B. a, Layer-wise similarity of hidden states, measured as subspace overlap (mean squared cosine of principal angles) between SVD subspaces explaining 95% variance; averaged over five runs with 24 demonstrations. Axes index layers. b, Number of principal components (PCs) needed to explain 95% variance across layers, increasing sharply in the middle layers. Shaded areas represent 95% CIs calculated from 10,000 bootstrap resamples across five runs. c, Overlap between GCCA-derived projection matrices across selected layers. d--e, The conceptual subspace becomes increasingly stable as the number of in-context demonstrations grows. d, Cross-context alignment of representational geometry within the GCCA subspace across demonstration sets, measured by RSA. e, Overlap between GCCA-derived projection matrices across demonstration sets. Axes in d--e indicate the number of demonstrations, with each cell representing a single run.
  • Figure 3: The conceptual subspace causally mediates model inference. a--c, Activation patching with $N$ demonstrations under three corruption conditions: description (a), label (b) and query (c). Patching the conceptual subspace (blue) is compared against a random-subspace baseline (red). The x-axis indexes layers, and the y-axis shows the normalized causal indirect effect (CIE). d--e, Subspace necessity and sufficiency tested by ablating the conceptual subspace (d) or isolating it (e). The y-axis denotes the change in log-probability of the correct token. f, Causal effects of cross-context transfer, where the relational structure from a source context is adapted to a target context via an orthogonal transformation. The y-axis reports the causal mediation (CMA) score. Performance is compared against a random-subspace baseline (red) and a same-context patch reference (black) using the conceptual subspace derived from the target context itself. Shaded regions indicate 95% CIs computed from 10,000 bootstrap resamples across five runs.
  • Figure 4: Attention patterns of attention heads with statistically significant causal indirect effects (CIEs) identified under description (a), label (b), and query corruption (c). The x-axis denotes attention heads by their (layer, head) index, and the y-axis shows the token spans attended to, grouped by source segment. The description, delimiter, and label spans for the 24 in-context demonstrations are ordered top-to-bottom based on their sequence in the prompt.
  • Figure 5: Contribution of attention heads to the conceptual subspace in Llama-3.1 70B. a, Contribution strength ($\alpha$) of attention heads to the conceptual subspace. b, Directional alignment (cosine similarity) between attention head outputs and the conceptual subspace. Bordered cells highlight attention heads with statistically significant causal indirect effects (CIEs).
  • ...and 11 more figures