Table of Contents
Fetching ...

Theoretical Bounds for Stable In-Context Learning

Tongxi Wang, Zhuoyang Xia

TL;DR

This paper designs a two-stage observable estimator that requires no prior knowledge and returns a concrete prompt length with a prescribed failure probability, and derives a non-asymptotic sufficient sample-size requirement (a lower bound on $K$) under sub-Gaussian representations, which induces a conservative upper bound on the unknown stability threshold.

Abstract

In-context learning (ICL) is a pivotal capability for the practical deployment of large-scale language models, yet its stability heavily depends on the number of examples provided in the prompt. Existing methods lack computable theoretical guidance to determine the minimal number of examples required. Heuristic rules commonly used in practice are often overly conservative and non-verifiable, readily leading to either instability from insufficient examples or inefficiency from redundant ones. This paper proposes that ICL stability can be characterized via a spectral-coverage proxy: the smallest eigenvalue of a regularized empirical second-moment matrix of demonstration representations, turning prompt-length selection into a computable estimation problem. We derive a non-asymptotic sufficient sample-size requirement (a lower bound on $K$) under sub-Gaussian representations, which in turn induces a conservative upper bound on the unknown stability threshold. We design a two-stage observable estimator that requires no prior knowledge and returns a concrete prompt length with a prescribed failure probability. Experiments show that the resulting estimates consistently upper-bound empirical knee-points, and a lightweight calibration further tightens the gap to about $1.03$--$1.20\times$, providing verifiable guidance for practical ICL prompt design.

Theoretical Bounds for Stable In-Context Learning

TL;DR

This paper designs a two-stage observable estimator that requires no prior knowledge and returns a concrete prompt length with a prescribed failure probability, and derives a non-asymptotic sufficient sample-size requirement (a lower bound on ) under sub-Gaussian representations, which induces a conservative upper bound on the unknown stability threshold.

Abstract

In-context learning (ICL) is a pivotal capability for the practical deployment of large-scale language models, yet its stability heavily depends on the number of examples provided in the prompt. Existing methods lack computable theoretical guidance to determine the minimal number of examples required. Heuristic rules commonly used in practice are often overly conservative and non-verifiable, readily leading to either instability from insufficient examples or inefficiency from redundant ones. This paper proposes that ICL stability can be characterized via a spectral-coverage proxy: the smallest eigenvalue of a regularized empirical second-moment matrix of demonstration representations, turning prompt-length selection into a computable estimation problem. We derive a non-asymptotic sufficient sample-size requirement (a lower bound on ) under sub-Gaussian representations, which in turn induces a conservative upper bound on the unknown stability threshold. We design a two-stage observable estimator that requires no prior knowledge and returns a concrete prompt length with a prescribed failure probability. Experiments show that the resulting estimates consistently upper-bound empirical knee-points, and a lightweight calibration further tightens the gap to about --, providing verifiable guidance for practical ICL prompt design.

Paper Structure

This paper contains 151 sections, 9 theorems, 105 equations, 1 figure, 12 tables, 1 algorithm.

Key Result

Proposition 3.3

Assume (A1) (sub-Gaussian features) and let $\xi\in(0,1)$. There exist universal constants $c,C>0$ such that if $\Delta_\rho>0$ and then the proxy objective holds:

Figures (1)

  • Figure 1: Schematic comparison of ICL prompt-length selection. (a) Conventional trial-and-error or heuristic approaches leave uncertainty about the minimal effective number of demonstrations (K). (b) Our proposed spectral-coverage method provides a theoretically grounded, observable proxy to determine a stable minimum prompt length (K*), ensuring reliable in-context learning.

Theorems & Definitions (16)

  • Definition 3.1: ICL stability at $(\tau,\xi)$
  • Remark 3.2: Scope of the proxy and instantiation of $\phi$
  • Proposition 3.3: Spectral coverage from $K$ demonstrations
  • Lemma 3.4: Spectral floor $\Rightarrow$ bounded sensitivity of ridge parameters
  • Definition 1.1: ICL stability at $(\tau,\xi)$
  • Remark 1.2: Scope of the proxy and instantiation of $\phi$
  • Lemma 1.3: Spectral floor implies ridge stability
  • Lemma 4.1: Second-moment shift under bounded drift
  • proof
  • Lemma 4.2: Variance proxy under bounded drift
  • ...and 6 more