PIDformer: Transformer Meets Control Theory

Tam Nguyen; César A. Uribe; Tan M. Nguyen; Richard G. Baraniuk

PIDformer: Transformer Meets Control Theory

Tam Nguyen, César A. Uribe, Tan M. Nguyen, Richard G. Baraniuk

TL;DR

This work tackles robustness and representation capacity gaps in transformer architectures by revealing self-attention as a discrete state-space evolution prone to perturbation sensitivity and rank collapse.It introduces a control framework that integrates a Proportional-Integral-Derivative (PID) controller into the state-space form, yielding a PID-controlled SSM and its discretized transformer variant, PIDformer.The authors provide theoretical guarantees showing enhanced stability and mitigated rank-collapse, and they validate the approach with experiments on ImageNet, ADE20K, and WikiText-103, demonstrating improved robustness to adversarial perturbations and better preservation of token diversity.Overall, PIDformer offers a principled, energy-regularized approach to robust, detail-preserving transformers with potential impact across vision and language tasks.

Abstract

In this work, we address two main shortcomings of transformer architectures: input corruption and rank collapse in their output representation. We unveil self-attention as an autonomous state-space model that inherently promotes smoothness in its solutions, leading to lower-rank outputs and diminished representation capacity. Moreover, the steady-state solution of the model is sensitive to input perturbations. We incorporate a Proportional-Integral-Derivative (PID) closed-loop feedback control system with a reference point into the model to improve robustness and representation capacity. This integration aims to preserve high-frequency details while bolstering model stability, rendering it more noise-resilient. The resulting controlled state-space model is theoretically proven robust and adept at addressing the rank collapse. Motivated by this control framework, we derive a novel class of transformers, PID-controlled Transformer (PIDformer), aimed at improving robustness and mitigating the rank-collapse issue inherent in softmax transformers. We empirically evaluate the model for advantages and robustness against baseline transformers across various practical tasks, including object classification, image segmentation, and language modeling.

PIDformer: Transformer Meets Control Theory

TL;DR

Abstract

Paper Structure (37 sections, 7 theorems, 65 equations, 3 figures, 3 tables)

This paper contains 37 sections, 7 theorems, 65 equations, 3 figures, 3 tables.

Introduction
Background: Self-Attention
Contribution
A Control Framework for Self-Attention
Connection between State Space Model and Nonlocal Variational Minimization
Stability and Representation Collapse of the State Space Model
Transformer with PID-Controller for State-Space Representation
Connection between (P) and (I) Components with Different Optimization Methods
Stability and Representation Collapse of PID-Controlled State Space Model
Analysis of P-control SSM
Analysis of PD-controlled SSM
Analysis of PID-controlled SSM
Transformer with PID Control
Experimental Results
Related Work
...and 22 more sections

Key Result

Lemma 1

Given $\{\alpha_1, \alpha_2,\dots, \alpha_M\}, M \leq N$, is the complex spectrum of ${\mathbf K} - {\mathbf I} \in \mathbb{R}^{N \times N}$. The solution of the ordinary differential equation (ODE) (eq:ode1) is given by where $\bm{P}\bm{J}\bm{P}^{-1}$ is the Jordan decomposition of $\bm{K} - \bm{I}$, $\bm{P}$ is invertible and contains the generalized eigenvectors of $\bm{K} - \bm{I}$, and $\bm{

Figures (3)

Figure 1: Our proposed PIDformer model at each layer.
Figure 2: The cosine similarity of token representations in PID DeiT compared to baseline DeiT models across layers for ImageNet classification. The DeiT baseline demonstrates representation rank collapse as tokens become increasingly similar as depth increases. In contrast, PID DeiT models exhibit significantly greater diversity in tokens, indicating a mitigation in rank-collapse.
Figure 3: The top-1 classification accuracy curves on ImageNet against FGSM and PGD attack methods, plotted against perturbation budgets (scaled by 255).

Theorems & Definitions (8)

Lemma 1
Lemma 2
Lemma 3
Lemma 4
Proposition 1
Lemma 5
Proposition 2
Definition 1: PID-control Transformer (PIDformer)

PIDformer: Transformer Meets Control Theory

TL;DR

Abstract

PIDformer: Transformer Meets Control Theory

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (8)