Table of Contents
Fetching ...

Permutation Equivariance of Transformers and Its Applications

Hengyuan Xu, Liyao Xiang, Hangyu Ye, Dixi Yao, Pengzhi Chu, Baochun Li

TL;DR

The work addresses how Transformer models handle permutation of inputs and parameters beyond simple inter-token shuffling by introducing permutation equivariance that covers both inter- and intra-token shuffling in forward and backward passes. It develops a formal framework with row and column permutations $P_R$ and $P_C$, proves that Transformer encoders are forward-permutation-equivariant and backward-permutation-invariant, and extends these results to general networks built from permutation-equivariant operators, with corresponding gradient mappings. Empirically, it validates these properties across ViT, BERT, and GPT2, demonstrates practical uses in privacy-preserving split learning and model authorization, and shows the approach incurs negligible computational overhead. The findings broaden the applicability of permutation properties in ordered-input tasks and offer new leverage for privacy, security, and model-protection strategies in real-world deployments.

Abstract

Revolutionizing the field of deep learning, Transformer-based models have achieved remarkable performance in many tasks. Recent research has recognized these models are robust to shuffling but are limited to inter-token permutation in the forward propagation. In this work, we propose our definition of permutation equivariance, a broader concept covering both inter- and intra- token permutation in the forward and backward propagation of neural networks. We rigorously proved that such permutation equivariance property can be satisfied on most vanilla Transformer-based models with almost no adaptation. We examine the property over a range of state-of-the-art models including ViT, Bert, GPT, and others, with experimental validations. Further, as a proof-of-concept, we explore how real-world applications including privacy-enhancing split learning, and model authorization, could exploit the permutation equivariance property, which implicates wider, intriguing application scenarios.

Permutation Equivariance of Transformers and Its Applications

TL;DR

The work addresses how Transformer models handle permutation of inputs and parameters beyond simple inter-token shuffling by introducing permutation equivariance that covers both inter- and intra-token shuffling in forward and backward passes. It develops a formal framework with row and column permutations and , proves that Transformer encoders are forward-permutation-equivariant and backward-permutation-invariant, and extends these results to general networks built from permutation-equivariant operators, with corresponding gradient mappings. Empirically, it validates these properties across ViT, BERT, and GPT2, demonstrates practical uses in privacy-preserving split learning and model authorization, and shows the approach incurs negligible computational overhead. The findings broaden the applicability of permutation properties in ordered-input tasks and offer new leverage for privacy, security, and model-protection strategies in real-world deployments.

Abstract

Revolutionizing the field of deep learning, Transformer-based models have achieved remarkable performance in many tasks. Recent research has recognized these models are robust to shuffling but are limited to inter-token permutation in the forward propagation. In this work, we propose our definition of permutation equivariance, a broader concept covering both inter- and intra- token permutation in the forward and backward propagation of neural networks. We rigorously proved that such permutation equivariance property can be satisfied on most vanilla Transformer-based models with almost no adaptation. We examine the property over a range of state-of-the-art models including ViT, Bert, GPT, and others, with experimental validations. Further, as a proof-of-concept, we explore how real-world applications including privacy-enhancing split learning, and model authorization, could exploit the permutation equivariance property, which implicates wider, intriguing application scenarios.
Paper Structure (26 sections, 16 theorems, 92 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 26 sections, 16 theorems, 92 equations, 9 figures, 7 tables, 1 algorithm.

Key Result

Theorem 4.1

Transformer encoder is permutation equivariant w.r.t. token permutations, i.e. the row permutation of the input matrix, in forward propagation, i.e., $\mathrm{Enc}({\bm{P}}_R {\bm{Z}}) = {\bm{P}}_R \mathrm{Enc} ({\bm{Z}})$ for any permutation matrix ${\bm{P}}_R \in \mathbb{R}^{n \times n}$.

Figures (9)

  • Figure 1: Illustration of Transformer backbone. Learnable weights in permutation are expressed by yellow blocks.
  • Figure 2: Illustration of permutation properties. ${\bm{W}}$ indicates main parameters in Transformer backbone (stacked Transformer encoders and decoders).
  • Figure 3: Reconstruction results of model inversion attacks to features. '+' means the privacy-preserving technique is enhanced by our row permutation.
  • Figure 4: Training curves of fine-tuning ViT. The authorized has a performance close to normal while the unauthorized has a high loss.
  • Figure 5: Validation loss curves of ViT trained to convergence. The unauthorized is far worse than the authorized but better than train-from-scratch.
  • ...and 4 more figures

Theorems & Definitions (25)

  • Theorem 4.1: Row Permutation Forward Equivariance
  • Theorem 4.2: Row Permutation Backward Invariance
  • Corollary 4.3
  • Theorem 4.4: Column Permutation Forward Equivariance
  • Theorem 4.5: Column Permutation Backward Equivariance
  • Corollary 4.6
  • Theorem 4.7: General Permutation Equivalent Networks
  • Lemma 4.8
  • proof
  • proof
  • ...and 15 more