Table of Contents
Fetching ...

Unveiling Transformer Perception by Exploring Input Manifolds

Alessandro Benfenati, Alfio Ferrara, Alessio Marta, Davide Riva, Elisabetta Rocchetti

TL;DR

This work addresses understanding Transformers by treating the input space as a manifold shaped by sequential layers. It develops a mathematically grounded framework using a singular pullback metric to define equivalence classes—sets of inputs yielding the same output distribution—and introduces two algorithms, SiMEC and SiMExp, to explore within and across these classes. The method enables reconstruction of equivalence classes and interpretable representations by mapping embeddings back to human-readable formats, demonstrated on ViT and BERT models across image and text tasks. Practically, this approach offers a principled way to study model sensitivity and generate interpretable, alternative inputs within controlled equivalence classes, with potential impact on explainability and robust input analysis for large Transformer architectures.

Abstract

This paper introduces a general method for the exploration of equivalence classes in the input space of Transformer models. The proposed approach is based on sound mathematical theory which describes the internal layers of a Transformer architecture as sequential deformations of the input manifold. Using eigendecomposition of the pullback of the distance metric defined on the output space through the Jacobian of the model, we are able to reconstruct equivalence classes in the input space and navigate across them. Our method enables two complementary exploration procedures: the first retrieves input instances that produce the same class probability distribution as the original instance-thus identifying elements within the same equivalence class-while the second discovers instances that yield a different class probability distribution, effectively navigating toward distinct equivalence classes. Finally, we demonstrate how the retrieved instances can be meaningfully interpreted by projecting their embeddings back into a human-readable format.

Unveiling Transformer Perception by Exploring Input Manifolds

TL;DR

This work addresses understanding Transformers by treating the input space as a manifold shaped by sequential layers. It develops a mathematically grounded framework using a singular pullback metric to define equivalence classes—sets of inputs yielding the same output distribution—and introduces two algorithms, SiMEC and SiMExp, to explore within and across these classes. The method enables reconstruction of equivalence classes and interpretable representations by mapping embeddings back to human-readable formats, demonstrated on ViT and BERT models across image and text tasks. Practically, this approach offers a principled way to study model sensitivity and generate interpretable, alternative inputs within controlled equivalence classes, with potential impact on explainability and robust input analysis for large Transformer architectures.

Abstract

This paper introduces a general method for the exploration of equivalence classes in the input space of Transformer models. The proposed approach is based on sound mathematical theory which describes the internal layers of a Transformer architecture as sequential deformations of the input manifold. Using eigendecomposition of the pullback of the distance metric defined on the output space through the Jacobian of the model, we are able to reconstruct equivalence classes in the input space and navigate across them. Our method enables two complementary exploration procedures: the first retrieves input instances that produce the same class probability distribution as the original instance-thus identifying elements within the same equivalence class-while the second discovers instances that yield a different class probability distribution, effectively navigating toward distinct equivalence classes. Finally, we demonstrate how the retrieved instances can be meaningfully interpreted by projecting their embeddings back into a human-readable format.
Paper Structure (11 sections, 6 theorems, 3 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 11 sections, 6 theorems, 3 equations, 2 figures, 1 table, 3 algorithms.

Key Result

Proposition 1

Let $\gamma:[0,1] \rightarrow M_i$ be a piecewise $\mathcal{C}^1$ curve. Let $j \in \{i,i+1,\cdots, n \}$ and consider the curve $\gamma_j = \Lambda_{j} \circ \cdots \circ \Lambda_i \circ \gamma$ on $M_j$. Then $Pl_i(\gamma)=Pl_j (\gamma_j)$.

Figures (2)

  • Figure 1: (Top figure) Example of exploration on a CIFAR10 image using SiMEC and SiMExp. Left: Original image, followed by interpretation outputs of $x_{750}$ from SiMEC (middle) and SiMExp (bottom). Right top: SVD projection of the explored points $x^{(1)}, \cdots, x^{(K)}$ for SiMEC (red) and SiMExp (blue), where color intensity encodes iteration progress (darker colors correspond to later iterations), and point shapes indicate predicted class labels. Right bottom: Evolution of class probabilities over iterations, for SiMEC (left) and SiMExp (right). (Bottom figure) Example of exploration on an MHS sentence using SiMEC and SiMExp. Visualization layout and interpretation are analogous to the top figure.
  • Figure 2: Mean and standard deviation (where applicable) of probability values for the original class (solid line) and the top predicted class (dashed line) based on embeddings obtained during exploration, across iterations and datasets. Subfigure (a) depicts the behavior of SiMEC (orange) and SiMExp (blue), while subfigure (b) reports the behavior of corresponding baseline algorithms. SiMExp results in a notable decrease in the probability of the original class, while the probability of the highest-scoring class decreases to a lesser extent, indicating a shift in the most probable class.

Theorems & Definitions (12)

  • Definition 1: Neural Network
  • Definition 2: Smooth layer
  • Remark 1
  • Definition 3: Singular Riemannian metric
  • Definition 4: Pseudolength and energy of a curve
  • Definition 5: Pseudodistance
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Corollary 1
  • ...and 2 more