Transformer Injectivity & Geometric Robustness - Analytic Margins and Bi-Lipschitz Uniformity of Sequence-Level Hidden States
Mikael von Strauss
TL;DR
The paper analyzes whether finite prompts map injectively to last-token hidden states in decoder-only Transformers under real-analytic assumptions. It introduces layerwise collision discriminants Δ^ℓ and injective strata 𝒰^ℓ, proving a dichotomy: each layer is either nowhere injective or injective on an open dense set, with injectivity persisting along smooth training trajectories when initialization is absolutely continuous and updates are non-singular. It extends the theory to quotient spaces Θ/G to account for internal symmetries, and couples it with empirical layerwise diagnostics—separation margins and co-Lipschitz constants—to quantify how close a model is to collisions. The empirical study on LLaMA-3 and Qwen across contexts, quantization levels, and training trajectories shows no exact collisions in full precision or 8-bit, some collisions under 4-bit quantization, and a robust, architecture-dependent geometry that stabilizes under normalization. Overall, the work provides a theoretical framework for generic injectivity in Transformers and a practical geometric toolkit for assessing near-invertibility in real-world models and perturbations.
Abstract
Under real-analytic assumptions on decoder-only Transformers, recent work shows that the map from discrete prompts to last-token hidden states is generically injective on finite prompt sets. We refine this picture: for each layer $\ell$ we define a collision discriminant $Δ^\ell \subset Θ$ and injective stratum $U^\ell = Θ\setminus Δ^\ell$, and prove a dichotomy -- either the model is nowhere injective on the set, or $U^\ell$ is open and dense and every $F^\ell_θ$ is injective. Under mild non-singularity assumptions on the optimizer and an absolutely continuous initialization, generic injectivity persists along smooth training trajectories over any fixed horizon. We also treat symmetry groups $G$, showing that discriminants and injective strata descend to the quotient $Θ/G$, so injectivity is naturally a property of functional equivalence classes. We complement these results with an empirical study of layerwise geometric diagnostics. We define a separation margin and a co-Lipschitz (lower Lipschitz) constant between prompt space and last-token representation space, estimated via nearest-neighbor statistics on large prompt sets. Applying these diagnostics to pretrained LLaMA-3 and Qwen models, we study behavior across layers, sequence lengths, model scales, and 8- and 4-bit activation quantization. On our sampled prompts we see no collisions in full precision or at 8 bits, while 4-bit quantization induces a small number of collisions and markedly shrinks co-Lipschitz estimates. For a small GPT-2 trained from scratch, normalized metrics remain stable over training. Overall, the results suggest that Transformer representations are generically and persistently injective in the continuous-parameter idealization, while their practical invertibility can be probed using simple geometric diagnostics.
