Table of Contents
Fetching ...

All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling

Emanuele Marconato, Sébastien Lachapelle, Sebastian Weichwald, Luigi Gresele

TL;DR

This work develops a rigorous identifiability framework for next-token predictors, showing that, under suitable conditions, distribution-equivalent models are related by extended-linear equivalence and thus share the same dot-product structure that governs next-token probabilities. It introduces effective complexity and an extended linear equivalence relation to generalize prior results and to unify several linear properties (such as parallelism, relational linearity, linear probing, and linear steering) within a coherent framework. The key finding is that, for many linear properties, either all or none of the distribution-equivalent models exhibit the property, provided certain subspace inclusion conditions hold; parallelism, however, can fail to be preserved in general. The work clarifies when empirically observed linear properties reflect universal aspects of the distribution rather than model-specific representations, with implications for interpreting and benchmarking language models and for guiding empirical analyses of representation learning.

Abstract

We analyze identifiability as a possible explanation for the ubiquity of linear properties across language models, such as the vector difference between the representations of "easy" and "easiest" being parallel to that between "lucky" and "luckiest". For this, we ask whether finding a linear property in one model implies that any model that induces the same distribution has that property, too. To answer that, we first prove an identifiability result to characterize distribution-equivalent next-token predictors, lifting a diversity requirement of previous results. Second, based on a refinement of relational linearity [Paccanaro and Hinton, 2001; Hernandez et al., 2024], we show how many notions of linearity are amenable to our analysis. Finally, we show that under suitable conditions, these linear properties either hold in all or none distribution-equivalent next-token predictors.

All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling

TL;DR

This work develops a rigorous identifiability framework for next-token predictors, showing that, under suitable conditions, distribution-equivalent models are related by extended-linear equivalence and thus share the same dot-product structure that governs next-token probabilities. It introduces effective complexity and an extended linear equivalence relation to generalize prior results and to unify several linear properties (such as parallelism, relational linearity, linear probing, and linear steering) within a coherent framework. The key finding is that, for many linear properties, either all or none of the distribution-equivalent models exhibit the property, provided certain subspace inclusion conditions hold; parallelism, however, can fail to be preserved in general. The work clarifies when empirically observed linear properties reflect universal aspects of the distribution rather than model-specific representations, with implications for interpreting and benchmarking language models and for guiding empirical analyses of representation learning.

Abstract

We analyze identifiability as a possible explanation for the ubiquity of linear properties across language models, such as the vector difference between the representations of "easy" and "easiest" being parallel to that between "lucky" and "luckiest". For this, we ask whether finding a linear property in one model implies that any model that induces the same distribution has that property, too. To answer that, we first prove an identifiability result to characterize distribution-equivalent next-token predictors, lifting a diversity requirement of previous results. Second, based on a refinement of relational linearity [Paccanaro and Hinton, 2001; Hernandez et al., 2024], we show how many notions of linearity are amenable to our analysis. Finally, we show that under suitable conditions, these linear properties either hold in all or none distribution-equivalent next-token predictors.

Paper Structure

This paper contains 42 sections, 23 theorems, 180 equations, 4 figures.

Key Result

Lemma 1

Given the orthogonal projectors ${\bm{\mathrm{P}}}_\mathcal{F}$ and ${\bm{\mathrm{P}}}_\mathcal{G}$, and the orthogonal projectors ${\bm{\mathrm{P}}}_\mathcal{M}$ and ${\bm{\mathrm{P}}}_\mathcal{N}$ onto, respectively, the spaces $\mathcal{M}$ and $\mathcal{N}$, defined as in Equation eq:left-right-

Figures (4)

  • Figure 1: Identifiability of linear properties. Plots in the left and right dotted squares show the embeddings of two next-token predictors $({\bm{\mathrm{f}}}, {\bm{\mathrm{g}}}), (\tilde{{\bm{\mathrm{f}}}}, \tilde{{\bm{\mathrm{g}}}}) \in \Theta$ that generate the same distribution $p_{{\bm{\mathrm{f}}}, {\bm{\mathrm{g}}}} = p_{\tilde{{\bm{\mathrm{f}}}}, \tilde{{\bm{\mathrm{g}}}}}$ within a set of conditionals distributions $p(y \mid {\bm{\mathrm{x}}})$. \ref{['thm:partial-identifiability']} proves a one-to-one correspondence (the dashed orange arrow) between conditional distributions and $\sim_{EL}$-equivalent models (the orange partitions of $\Theta$). This extends a result by roeder2021linear characterizing $\sim_L$-equivalent models (green partitions of $\Theta$). Here $({\bm{\mathrm{f}}}, {\bm{\mathrm{g}}}) \textcolor{orange}{\sim_{EL}} (\tilde{{\bm{\mathrm{f}}}}, \tilde{{\bm{\mathrm{g}}}})$ while the embedding representations are not equal up to a linear transformation (thus $({\bm{\mathrm{f}}}, {\bm{\mathrm{g}}}) \textcolor{darkgreen}{\not \sim_{L}} (\tilde{{\bm{\mathrm{f}}}}, \tilde{{\bm{\mathrm{g}}}})$), as shown by how the purple and blue parallelograms in the embeddings of the left model $({\bm{\mathrm{f}}}, {\bm{\mathrm{g}}})$ get distorted in those of the right model $(\tilde{{\bm{\mathrm{f}}}}, \tilde{{\bm{\mathrm{g}}}})$ . Both models display relational linear probing for the query ${\bm{\mathrm{q}}}$$=$"Is the text written in English?": one can linearly separate the embeddings of textual inputs which, when concatenated with ${\bm{\mathrm{q}}}$, have "yes" as the likeliest next token, from those that yield "no". In \ref{['prop:part-id-lin-rep-tentative']}, we provide conditions under which all or none of the models in the $\textcolor{orange}{\sim_{EL}}$ equivalence class share the same linear property.
  • Figure 2: Illustration of the $\sim_{EL}$ equivalence relation. (Left) In the leftmost model, $({\bm{\mathrm{f}}}, {\bm{\mathrm{g}}})$, the embeddings lie on a manifold $\mathrm{Im}({\bm{\mathrm{f}}})\subsetneq \mathbb{R}^3$, yielding $\mathrm{SIm} ({\bm{\mathrm{f}}}) \xspace = \mathbb{R}^3$. To ease visualization, $\mathrm{Im}({\bm{\mathrm{f}}})$ is plotted as a continuous manifold in the figure, although in practice textual inputs are discrete. The unembeddings lie on a two-dimensional space, $\mathrm{SIm} ({\bm{\mathrm{g}}}_0) \xspace \cong \mathbb{R}^2$, drawn in light blue. Consequently the projectors ${\bm{\mathrm{P}}}_\mathcal{M}$ and ${\bm{\mathrm{P}}}_\mathcal{N}$ map onto a two-dimensional subspace, i.e., ${\bm{\mathrm{P}}}_\mathcal{M} = {\bm{\mathrm{P}}}_\mathcal{N} = {\bm{\mathrm{P}}}_\mathcal{G}$. (Right) The rightmost model, $(\tilde{{\bm{\mathrm{f}}}}, \tilde{{\bm{\mathrm{g}}}})$, represents both the embeddings and the unembeddings in a two-dimensional space. We therefore have $\mathrm{SIm} (\tilde{{\bm{\mathrm{f}}}}) \xspace = \mathrm{SIm} (\tilde{{\bm{\mathrm{g}}}_0}) \xspace = \mathbb{R}^2$, which implies that ${\bm{\mathrm{P}}}_{\tilde{\mathcal{M}}} = {\bm{\mathrm{P}}}_{\tilde{\mathcal{N}}} = {\bm{\mathrm{I}}}$. Thus applying these projection matrices to embeddings and unembeddings leaves them unchaged (top-right and bottom-right grids). (Center) The equivalence relation $\sim_{EL}$ specifies that both ${\bm{\mathrm{P}}}_\mathcal{M} {\bm{\mathrm{f}}}$ and ${\bm{\mathrm{P}}}_{\tilde{\mathcal{M}}} \tilde{{\bm{\mathrm{f}}}}$, as well as ${\bm{\mathrm{P}}}_\mathcal{N} {\bm{\mathrm{g}}}$ and ${\bm{\mathrm{P}}}_{\tilde{\mathcal{N}}} \tilde{{\bm{\mathrm{g}}}}$, are related by linear invertible transformations defined by the matrices ${\bm{\mathrm{M}}}, {\bm{\mathrm{N}}} \in \mathbb{R}^{3 \times 2}$.
  • Figure 3: Relational linear subspaces. The figure depicts the embedding function ${\bm{\mathrm{f}}}$ of a model $({\bm{\mathrm{f}}}, {\bm{\mathrm{g}}}) \in \Theta$ with representation dimension $d=2$. Let ${\bm{\mathrm{g}}}_o(\textit{"English"}) := {\bm{\mathrm{g}}}(\textit{"English"}) - {\bm{\mathrm{g}}}(\textit{"other language"})$ and ${\bm{\mathrm{g}}}_{n}(\textit{"yes"}) := {\bm{\mathrm{g}}}(\textit{"yes"}) - {\bm{\mathrm{g}}}(\textit{"no"})$. Here, $({\bm{\mathrm{f}}}, {\bm{\mathrm{g}}})$ linearly represents (\ref{['def:relational-linear-subspaces']}) the subspace spanned by ${\bm{\mathrm{g}}}_o(\textit{"English"})$ for the query ${\bm{\mathrm{q}}}=$"Is the text written in English?". Accordingly, there exists a vector, here ${\bm{\mathrm{g}}}_n(\textit{"yes"})$, such that the dot product ${\bm{\mathrm{g}}}_o(\textit{"English"})^\top {\bm{\mathrm{f}}}({\bm{\mathrm{s}}})$, whose magnitude is represented through the color map on the right, matches the dot product ${{\bm{\mathrm{g}}}_n(\textit{"yes"})^\top{\bm{\mathrm{f}}}({\bm{\mathrm{s}}} \mathbin{\smallfrown} {\bm{\mathrm{q}}})}$, on the left. For ease of visualization, we set ${\bm{\mathrm{g}}}_n(\textit{"yes"})^\top {\bm{\mathrm{a}}}_{\bm{\mathrm{q}}} = 0$ and we display the values of the dot products for two input contexts ${\bm{\mathrm{s}}}_1, {\bm{\mathrm{s}}}_2$. Intuitively, the dot product of a context's embedding ${\bm{\mathrm{f}}}({\bm{\mathrm{s}}})$ with ${\bm{\mathrm{g}}}_{o}(\textit{"English"})$ captures the log-probability ratio of "yes" vs. "no" as next tokens for the same context ${\bm{\mathrm{s}}}$ concatenated with the query ${\bm{\mathrm{q}}}$.
  • Figure 4: Allowed distorsions among $\sim_{EL}$-equivalent models. From the left, model embeddings ${\bm{\mathrm{f}}}$ are given different colors. The red segment is non-linearly transformed on the right along $\tilde{{\bm{\mathrm{f}}}}_1$, whereas they remain equal to the left on the component $\tilde{{\bm{\mathrm{f}}}}_2$. This shows that the models are not $\sim_L$-equivalent.

Theorems & Definitions (52)

  • Definition 1: Diversity condition
  • Example 1
  • Lemma 1
  • Definition 1: Extended linear equivalence
  • Proposition 1
  • Theorem 2
  • Corollary 2: Adapted from roeder2021linear
  • Definition 3: Parallelism in $\Gamma$
  • Lemma 3
  • Example 2
  • ...and 42 more