Table of Contents
Fetching ...

Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels

Anantha Padmanaban Krishna Kumar

TL;DR

The paper investigates whether in-context learning can override pre-trained label semantics in open-source LLMs or merely refine existing semantic priors. By comparing natural demonstrations with inverted demonstrations and decomposing effects into truth, prior, and prompt alignment, the authors show that semantic anchors are robust across 1–12B models and eight tasks: natural ICL improves accuracy while preserving prior alignment, whereas inverted ICL fails to produce a coherent anti-semantic mapping, yielding a semantic override rate of exactly zero. The study introduces a semantic override metric and applies it across eight model families (LLaMA, Mistral, Qwen, Gemma) and eight tasks, revealing that ICL operates through prior refinement within a stable semantic space rather than remapping label meanings. Practically, these findings imply that few-shot prompting cannot substitute for interventions like symbol tuning or fine-tuning when non-standard label semantics are required, and they provide a diagnostic framework for when ICL will be effective. The work advances our understanding of the geometric nature of semantic labels in pre-trained representations and sets clear limits on the flexibility of few-shot prompting.

Abstract

Can in-context learning (ICL) override pre-trained label semantics, or does it merely refine an existing semantic backbone? We address this question by treating LLMs as prompt-induced classifiers and contrasting their behavior under \emph{natural} demonstrations (with correct labels) and \emph{inverted} demonstrations (systematically flipping label meanings). We decompose ICL behavior into three alignment metrics (truth, prior, and prompt alignment) and introduce a semantic override rate, defined as correctness under flipped semantics. Across eight classification tasks and eight open-source LLMs (1--12B parameters), we find consistent evidence for a semantic anchor view. With natural demonstrations, ICL improves accuracy while maintaining strong prior alignment; most correct predictions coincide with zero-shot behavior, even when the prior is weak. With inverted demonstrations, models cannot learn coherent anti-semantic classifiers: prompt alignment increases only by sacrificing accuracy, and semantic override rates remain exactly zero in our few-shot 1--12B setting. Rather than flexibly remapping label meanings, ICL primarily adjusts how inputs project onto stable semantic directions learned during pre-training, clarifying fundamental limits of few-shot prompting and suggesting that overriding label semantics at these scales requires interventions beyond ICL. All code is available at: https://github.com/AnanthaPadmanaban-KrishnaKumar/semantic-anchors-icl.

Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels

TL;DR

The paper investigates whether in-context learning can override pre-trained label semantics in open-source LLMs or merely refine existing semantic priors. By comparing natural demonstrations with inverted demonstrations and decomposing effects into truth, prior, and prompt alignment, the authors show that semantic anchors are robust across 1–12B models and eight tasks: natural ICL improves accuracy while preserving prior alignment, whereas inverted ICL fails to produce a coherent anti-semantic mapping, yielding a semantic override rate of exactly zero. The study introduces a semantic override metric and applies it across eight model families (LLaMA, Mistral, Qwen, Gemma) and eight tasks, revealing that ICL operates through prior refinement within a stable semantic space rather than remapping label meanings. Practically, these findings imply that few-shot prompting cannot substitute for interventions like symbol tuning or fine-tuning when non-standard label semantics are required, and they provide a diagnostic framework for when ICL will be effective. The work advances our understanding of the geometric nature of semantic labels in pre-trained representations and sets clear limits on the flexibility of few-shot prompting.

Abstract

Can in-context learning (ICL) override pre-trained label semantics, or does it merely refine an existing semantic backbone? We address this question by treating LLMs as prompt-induced classifiers and contrasting their behavior under \emph{natural} demonstrations (with correct labels) and \emph{inverted} demonstrations (systematically flipping label meanings). We decompose ICL behavior into three alignment metrics (truth, prior, and prompt alignment) and introduce a semantic override rate, defined as correctness under flipped semantics. Across eight classification tasks and eight open-source LLMs (1--12B parameters), we find consistent evidence for a semantic anchor view. With natural demonstrations, ICL improves accuracy while maintaining strong prior alignment; most correct predictions coincide with zero-shot behavior, even when the prior is weak. With inverted demonstrations, models cannot learn coherent anti-semantic classifiers: prompt alignment increases only by sacrificing accuracy, and semantic override rates remain exactly zero in our few-shot 1--12B setting. Rather than flexibly remapping label meanings, ICL primarily adjusts how inputs project onto stable semantic directions learned during pre-training, clarifying fundamental limits of few-shot prompting and suggesting that overriding label semantics at these scales requires interventions beyond ICL. All code is available at: https://github.com/AnanthaPadmanaban-KrishnaKumar/semantic-anchors-icl.

Paper Structure

This paper contains 23 sections, 1 equation, 2 figures, 14 tables.

Figures (2)

  • Figure 1: Macro-averaged alignment probabilities for LLaMA-3.1-8B-Instruct. Left: natural ICL increases both accuracy and joint correctness while maintaining prior alignment. Right: inverted ICL degrades accuracy and prior alignment as prompt-following increases, but joint alignment remains zero.
  • Figure 2: Accuracy vs. demonstrations $k$ for LLaMA-3.1-8B-Instruct. Natural ICL (blue) improves performance; inverted ICL (orange) degrades systematically with more examples.