Table of Contents
Fetching ...

Transformer Injectivity & Geometric Robustness - Analytic Margins and Bi-Lipschitz Uniformity of Sequence-Level Hidden States

Mikael von Strauss

TL;DR

The paper analyzes whether finite prompts map injectively to last-token hidden states in decoder-only Transformers under real-analytic assumptions. It introduces layerwise collision discriminants Δ^ℓ and injective strata 𝒰^ℓ, proving a dichotomy: each layer is either nowhere injective or injective on an open dense set, with injectivity persisting along smooth training trajectories when initialization is absolutely continuous and updates are non-singular. It extends the theory to quotient spaces Θ/G to account for internal symmetries, and couples it with empirical layerwise diagnostics—separation margins and co-Lipschitz constants—to quantify how close a model is to collisions. The empirical study on LLaMA-3 and Qwen across contexts, quantization levels, and training trajectories shows no exact collisions in full precision or 8-bit, some collisions under 4-bit quantization, and a robust, architecture-dependent geometry that stabilizes under normalization. Overall, the work provides a theoretical framework for generic injectivity in Transformers and a practical geometric toolkit for assessing near-invertibility in real-world models and perturbations.

Abstract

Under real-analytic assumptions on decoder-only Transformers, recent work shows that the map from discrete prompts to last-token hidden states is generically injective on finite prompt sets. We refine this picture: for each layer $\ell$ we define a collision discriminant $Δ^\ell \subset Θ$ and injective stratum $U^\ell = Θ\setminus Δ^\ell$, and prove a dichotomy -- either the model is nowhere injective on the set, or $U^\ell$ is open and dense and every $F^\ell_θ$ is injective. Under mild non-singularity assumptions on the optimizer and an absolutely continuous initialization, generic injectivity persists along smooth training trajectories over any fixed horizon. We also treat symmetry groups $G$, showing that discriminants and injective strata descend to the quotient $Θ/G$, so injectivity is naturally a property of functional equivalence classes. We complement these results with an empirical study of layerwise geometric diagnostics. We define a separation margin and a co-Lipschitz (lower Lipschitz) constant between prompt space and last-token representation space, estimated via nearest-neighbor statistics on large prompt sets. Applying these diagnostics to pretrained LLaMA-3 and Qwen models, we study behavior across layers, sequence lengths, model scales, and 8- and 4-bit activation quantization. On our sampled prompts we see no collisions in full precision or at 8 bits, while 4-bit quantization induces a small number of collisions and markedly shrinks co-Lipschitz estimates. For a small GPT-2 trained from scratch, normalized metrics remain stable over training. Overall, the results suggest that Transformer representations are generically and persistently injective in the continuous-parameter idealization, while their practical invertibility can be probed using simple geometric diagnostics.

Transformer Injectivity & Geometric Robustness - Analytic Margins and Bi-Lipschitz Uniformity of Sequence-Level Hidden States

TL;DR

The paper analyzes whether finite prompts map injectively to last-token hidden states in decoder-only Transformers under real-analytic assumptions. It introduces layerwise collision discriminants Δ^ℓ and injective strata 𝒰^ℓ, proving a dichotomy: each layer is either nowhere injective or injective on an open dense set, with injectivity persisting along smooth training trajectories when initialization is absolutely continuous and updates are non-singular. It extends the theory to quotient spaces Θ/G to account for internal symmetries, and couples it with empirical layerwise diagnostics—separation margins and co-Lipschitz constants—to quantify how close a model is to collisions. The empirical study on LLaMA-3 and Qwen across contexts, quantization levels, and training trajectories shows no exact collisions in full precision or 8-bit, some collisions under 4-bit quantization, and a robust, architecture-dependent geometry that stabilizes under normalization. Overall, the work provides a theoretical framework for generic injectivity in Transformers and a practical geometric toolkit for assessing near-invertibility in real-world models and perturbations.

Abstract

Under real-analytic assumptions on decoder-only Transformers, recent work shows that the map from discrete prompts to last-token hidden states is generically injective on finite prompt sets. We refine this picture: for each layer we define a collision discriminant and injective stratum , and prove a dichotomy -- either the model is nowhere injective on the set, or is open and dense and every is injective. Under mild non-singularity assumptions on the optimizer and an absolutely continuous initialization, generic injectivity persists along smooth training trajectories over any fixed horizon. We also treat symmetry groups , showing that discriminants and injective strata descend to the quotient , so injectivity is naturally a property of functional equivalence classes. We complement these results with an empirical study of layerwise geometric diagnostics. We define a separation margin and a co-Lipschitz (lower Lipschitz) constant between prompt space and last-token representation space, estimated via nearest-neighbor statistics on large prompt sets. Applying these diagnostics to pretrained LLaMA-3 and Qwen models, we study behavior across layers, sequence lengths, model scales, and 8- and 4-bit activation quantization. On our sampled prompts we see no collisions in full precision or at 8 bits, while 4-bit quantization induces a small number of collisions and markedly shrinks co-Lipschitz estimates. For a small GPT-2 trained from scratch, normalized metrics remain stable over training. Overall, the results suggest that Transformer representations are generically and persistently injective in the continuous-parameter idealization, while their practical invertibility can be probed using simple geometric diagnostics.

Paper Structure

This paper contains 34 sections, 8 theorems, 69 equations, 5 figures.

Key Result

Theorem 2.7

Layerwise generic injectivity. For each layer $\ell$, under the definitions above, either In particular, if there exists at least one configuration $\theta$ for which $F^\ell_\theta$ is injective, then $F^\ell_\theta$ is generically injective on $\mathcal{S}$, i.e. injective for all $\theta$ in an open dense subset of $\Theta$.

Figures (5)

  • Figure 1: Layerwise geometry of last-token representations for Llama-3.1-8B-Instruct. Margin and co-Lipschitz values are always plotted for $q = 1$% worst-percentile. The raw values of margins and co-Lipschitz estimates are strongly influenced by the growing $\ell_2$ norms through the layers.
  • Figure 2: Dependence of normalized last-layer margin and co-Lipschitz estimates on sequence length for models in the Qwen (0.5B--3B) and LLaMA-3 (1B--8B) families, $q = 1$%. Longer contexts tend to reduce the worst-percentile co-Lipschitz constants, indicating more contractive behavior on hard pairs.
  • Figure 3: Layerwise normalized separation margins and co-Lipschitz estimates for models in the Qwen (0.5B–3B) and LLaMA-3 (1B–8B) families, $q = 1\%$. Normalization removes the overall growth in representation norms and reveals tight clustering within each model family, together with a consistent offset between families.
  • Figure 4: Effect of post-hoc uniform per-layer activation quantization (8- and 4-bit) on last-token representations for Llama-3.1-8B-Instruct. Top left: mean $\|h^\ell\|_2$ per layer; Top right: normalized separation margins; Bottom left: exact collision counts per layer $C_\ell$. Bottom right: normalized co-Lipschitz $\tilde{\alpha}$. Full-precision and 8-bit show no collisions on the sampled prompt set and only mild margin erosion, whereas 4-bit reduces normalized margins more substantially and introduces collisions predominantly in deeper layers, indicating a sharper approach to the discriminant under aggressive discretization.
  • Figure 5: Near-collision fraction by layer for Llama-3.1-8B-Instruct. Each heatmap shows the fraction of prompt pairs whose last-token representations are within a tolerance $\varepsilon$ (rows) at each layer (columns). Left (FP): near-collisions appear only in early layers and only at the loose tolerance $\varepsilon=10^{-2}$; they vanish by mid-depth and are essentially zero for $\varepsilon<10^{-4}$. Right (4-bit activ. quantization): near-collisions persist throughout the network and sharply increase in the final third of layers, indicating depth-amplified discretization effects. Color indicates fraction (shared scale). This pattern is consistent with shrinking safety margins under 4-bit quantization, while full precision remains well separated except at very shallow layers.

Theorems & Definitions (23)

  • Definition 2.1: Prompt set $\mathcal{S}$
  • Definition 2.2: Parameter space $\Theta$
  • Definition 2.3: Layer-wise forward maps $F^\ell_\theta$
  • Definition 2.4: Collision sets $Z^\ell_{s,\tilde{s}}$
  • Definition 2.5: Discriminant $\Delta^\ell$
  • Definition 2.6: Injective stratum $\mathcal{U}^\ell$
  • Theorem 2.7
  • proof
  • Corollary 2.8: Simultaneous generic injectivity across layers
  • proof
  • ...and 13 more