Table of Contents
Fetching ...

Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs

Mingyu Jin, Yutong Yin, Jingcheng Niu, Qingcheng Zeng, Wujiang Xu, Mengnan Du, Wei Cheng, Zhaoran Wang, Tianlong Chen, Dimitris N. Metaxas

TL;DR

This study provides new mechanistic insights into how LLMs internalize OOD challenges, and designs Sparsity-Guided Curriculum In-Context Learning (SG-ICL), a strategy that explicitly uses representation sparsity to schedule few-shot demonstrations, leading to considerable performance enhancements.

Abstract

In this work, we investigate how Large Language Models (LLMs) adapt their internal representations when encountering inputs of increasing difficulty, quantified as the degree of out-of-distribution (OOD) shift. We reveal a consistent and quantifiable phenomenon: as task difficulty increases, whether through harder reasoning questions, longer contexts, or adding answer choices, the last hidden states of LLMs become substantially sparser. In short, \textbf{\textit{the farther the shift, the sparser the representations}}. This sparsity--difficulty relation is observable across diverse models and domains, suggesting that language models respond to unfamiliar or complex inputs by concentrating computation into specialized subspaces in the last hidden state. Through a series of controlled analyses with a learning dynamic explanation, we demonstrate that this sparsity is not incidental but an adaptive mechanism for stabilizing reasoning under OOD. Leveraging this insight, we design \textit{Sparsity-Guided Curriculum In-Context Learning (SG-ICL)}, a strategy that explicitly uses representation sparsity to schedule few-shot demonstrations, leading to considerable performance enhancements. Our study provides new mechanistic insights into how LLMs internalize OOD challenges. The source code is available at the URL: https://github.com/MingyuJ666/sparsityLLM.

Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs

TL;DR

This study provides new mechanistic insights into how LLMs internalize OOD challenges, and designs Sparsity-Guided Curriculum In-Context Learning (SG-ICL), a strategy that explicitly uses representation sparsity to schedule few-shot demonstrations, leading to considerable performance enhancements.

Abstract

In this work, we investigate how Large Language Models (LLMs) adapt their internal representations when encountering inputs of increasing difficulty, quantified as the degree of out-of-distribution (OOD) shift. We reveal a consistent and quantifiable phenomenon: as task difficulty increases, whether through harder reasoning questions, longer contexts, or adding answer choices, the last hidden states of LLMs become substantially sparser. In short, \textbf{\textit{the farther the shift, the sparser the representations}}. This sparsity--difficulty relation is observable across diverse models and domains, suggesting that language models respond to unfamiliar or complex inputs by concentrating computation into specialized subspaces in the last hidden state. Through a series of controlled analyses with a learning dynamic explanation, we demonstrate that this sparsity is not incidental but an adaptive mechanism for stabilizing reasoning under OOD. Leveraging this insight, we design \textit{Sparsity-Guided Curriculum In-Context Learning (SG-ICL)}, a strategy that explicitly uses representation sparsity to schedule few-shot demonstrations, leading to considerable performance enhancements. Our study provides new mechanistic insights into how LLMs internalize OOD challenges. The source code is available at the URL: https://github.com/MingyuJ666/sparsityLLM.
Paper Structure (62 sections, 11 theorems, 65 equations, 15 figures, 4 tables)

This paper contains 62 sections, 11 theorems, 65 equations, 15 figures, 4 tables.

Key Result

Lemma 3.2

For all $t$ for which the trajectories are differentiable,

Figures (15)

  • Figure 1: Harder Inputs Induce Sparser Representations. Across all four controlled difficulty axes, the last hidden states become progressively sparser as tasks get harder. Results are shown for Qwen2.5-3B using Top-10% Energy; nevertheless, the same trend holds across difficulty settings, sparsity metrics, and LLM sizes.
  • Figure 2: Overview of Sparsity Analysis. Together, the two subfigures paint a consistent picture: \ref{['fig3']} (left) shows that difficulty increases sparsity, while \ref{['fig4']} (right) shows that sparsity tracks accuracy degradation.
  • Figure 3: Sparsity Metrics under Answer Choice Expansion. Bar plots show mean sparsity across 14 disciplines for five metrics under Normal (+0), Moderate Expansion (+5), and Large Expansion (+10) on Qwen2.5-3B. Error bars indicate the minimum and maximum across disciplines. Increasing task difficulty leads to higher sparsity.
  • Figure 4: Sparsity Differences under Knowledge Conflict. We measure the last hidden state sparsity for two conditions (non-conflict ( (0.2ex,0.2ex)) and conflict ( (0.2ex,0.2ex))) across five metrics for Qwen2.5-3B. All results are statistically significant. Arrows denote how each metric relates to sparsity ($\uparrow$: higher is sparser; $\downarrow$: lower is sparser). Again, the harder conflict ( (0.2ex,0.2ex)) condition is consistently sparser than the non-conflict ( (0.2ex,0.2ex)) condition across all metrics.
  • Figure 5: Layer-wise Sparsity across Context Lengths. While intermediate layers show minimal variation across contexts, the final layers exhibit sharp divergence: longer contexts consistently produce sparser representations. This experiment was done at LongReasonQA li2025longcontext, which can control the background context length.
  • ...and 10 more figures

Theorems & Definitions (14)

  • Remark 3.1: A simple setting where \ref{['eq:appC_induced_h_dynamics']} holds exactly
  • Lemma 3.2: Exact drift identity
  • Lemma 3.3: Two-sided decay bound
  • Lemma 3.5: Uniform bound on $|D_\varepsilon|$ on Phase I
  • Lemma 3.6: Phase I dynamic
  • Corollary 3.7: Phase I decrease trend + certified hitting time
  • Remark 3.8: What Phase I does and does not claim
  • Lemma 3.10: Top-2 dominance from runner-up separation
  • Lemma 3.11: Diagonal negativity on $S$
  • Lemma 3.12: Uniform negativity of $D_\varepsilon$ on the Phase II window (with easy-complement control)
  • ...and 4 more