Table of Contents
Fetching ...

Sequences of Logits Reveal the Low Rank Structure of Language Models

Noah Golowich, Allen Liu, Abhishek Shetty

TL;DR

The paper investigates the intrinsic low-rank structure of language models by analyzing the extended logit matrix, which encodes log-probabilities across histories and futures. Empirically, the authors show that this matrix is well-approximated by a low-rank representation across diverse models and datasets, with a power-law decay of singular values and a rank that remains manageable as the horizon grows. They exploit this structure to develop Lingen, a linear-generation method that can sample continuations from a target prompt using logits from unrelated prompts, and they construct a time-varying ISAN as a simple, provably equivalent generative model for low-rank logit behavior, with learning guarantees via logit queries. The work bridges empirical observations with theory, suggesting broad implications for efficiency, interpretability, and safety, and pointing to future directions in training dynamics, model-stealing risks, and robust defense strategies.

Abstract

A major problem in the study of large language models is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure: in particular, matrices built from the model's logits for varying sets of prompts and responses have low approximate rank. We then show that this low-rank structure can be leveraged for generation -- in particular, we can generate a response to a target prompt using a linear combination of the model's outputs on unrelated, or even nonsensical prompts. On the theoretical front, we observe that studying the approximate rank of language models in the sense discussed above yields a simple universal abstraction whose theoretical predictions parallel our experiments. We then analyze the representation power of the abstraction and give provable learning guarantees.

Sequences of Logits Reveal the Low Rank Structure of Language Models

TL;DR

The paper investigates the intrinsic low-rank structure of language models by analyzing the extended logit matrix, which encodes log-probabilities across histories and futures. Empirically, the authors show that this matrix is well-approximated by a low-rank representation across diverse models and datasets, with a power-law decay of singular values and a rank that remains manageable as the horizon grows. They exploit this structure to develop Lingen, a linear-generation method that can sample continuations from a target prompt using logits from unrelated prompts, and they construct a time-varying ISAN as a simple, provably equivalent generative model for low-rank logit behavior, with learning guarantees via logit queries. The work bridges empirical observations with theory, suggesting broad implications for efficiency, interpretability, and safety, and pointing to future directions in training dynamics, model-stealing risks, and robust defense strategies.

Abstract

A major problem in the study of large language models is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure: in particular, matrices built from the model's logits for varying sets of prompts and responses have low approximate rank. We then show that this low-rank structure can be leveraged for generation -- in particular, we can generate a response to a target prompt using a linear combination of the model's outputs on unrelated, or even nonsensical prompts. On the theoretical front, we observe that studying the approximate rank of language models in the sense discussed above yields a simple universal abstraction whose theoretical predictions parallel our experiments. We then analyze the representation power of the abstraction and give provable learning guarantees.

Paper Structure

This paper contains 52 sections, 7 theorems, 37 equations, 11 figures, 3 tables, 3 algorithms.

Key Result

Theorem 4.3

Let $M$ be a language model over sequences of length $T$. Then $M$ is expressible as a time-varying ISAN with hidden dimension $d$ if and only if the logit matrix $\mathcal{L}_{M}(\Sigma^{t}, \Sigma^{\leq T - t})$ has rank at most $d$ for all $t \leq T$.

Figures (11)

  • Figure 1: (a): Low-rank approximation error (measured by average KL divergence; see \ref{['def:avg-kl-div']}) of the extended logit matrix for OLMo-7b, and ranks 5-500. For fixed sets $\mathcal{H}, \mathcal{F}$, the approximation errors for the logit matrix $\mathcal{L}_M(\mathcal{H}, \mathcal{F})$ behave according to a similar power law as to those of various sub-matrices with $\{2, 4, 8, 16\}$-times fewer entries. Dashed line at top shows performance of a (suboptimal) rank-1 baseline. (b): Performance of our generation procedure (Lingen; star markers) which exploits the low-rank structure of the extended logit matrix to generate from a given "target" prompt by only querying the language model on nonsensical prompts unrelated to the target. We plot the KL divergence between Lingen and the true language model (OLMo-1b) at each token position, as well as various baselines (solid lines; see \ref{['sec:exp-generation']}).
  • Figure 2: OLMo-7b singular values; Power law exponent $\alpha \approx 0.536$
  • Figure 3: Low-rank approximations (wrt. avg. KL divergence) over Stage-1 pretraining of OLMo-1b.
  • Figure 4: Cos of principal angles btwn. column spaces of low-rank apx. of $\mathcal{L}_M(\mathcal{H},\mathcal{F})$ & $\mathcal{L}_M(\mathcal{H}, \mathcal{F}^{\mathsf{nonsense}})$.
  • Figure 5: Lingen with OLMo-1b.
  • ...and 6 more figures

Theorems & Definitions (38)

  • Definition 2.1: Mean-Centered Logits
  • Definition 2.2: Extended Logit Matrix
  • Definition 3.1: Average KL divergence
  • Definition 4.1: Logit Rank
  • Definition 4.2: Time-varying ISAN
  • Theorem 4.3: Equivalence between Low Logit Rank and Time-Varying ISAN
  • Theorem 4.4
  • proof
  • proof
  • Definition C.1: Input Switched Affine Network (ISAN)
  • ...and 28 more