PIDformer: Transformer Meets Control Theory
Tam Nguyen, César A. Uribe, Tan M. Nguyen, Richard G. Baraniuk
TL;DR
This work tackles robustness and representation capacity gaps in transformer architectures by revealing self-attention as a discrete state-space evolution prone to perturbation sensitivity and rank collapse.It introduces a control framework that integrates a Proportional-Integral-Derivative (PID) controller into the state-space form, yielding a PID-controlled SSM and its discretized transformer variant, PIDformer.The authors provide theoretical guarantees showing enhanced stability and mitigated rank-collapse, and they validate the approach with experiments on ImageNet, ADE20K, and WikiText-103, demonstrating improved robustness to adversarial perturbations and better preservation of token diversity.Overall, PIDformer offers a principled, energy-regularized approach to robust, detail-preserving transformers with potential impact across vision and language tasks.
Abstract
In this work, we address two main shortcomings of transformer architectures: input corruption and rank collapse in their output representation. We unveil self-attention as an autonomous state-space model that inherently promotes smoothness in its solutions, leading to lower-rank outputs and diminished representation capacity. Moreover, the steady-state solution of the model is sensitive to input perturbations. We incorporate a Proportional-Integral-Derivative (PID) closed-loop feedback control system with a reference point into the model to improve robustness and representation capacity. This integration aims to preserve high-frequency details while bolstering model stability, rendering it more noise-resilient. The resulting controlled state-space model is theoretically proven robust and adept at addressing the rank collapse. Motivated by this control framework, we derive a novel class of transformers, PID-controlled Transformer (PIDformer), aimed at improving robustness and mitigating the rank-collapse issue inherent in softmax transformers. We empirically evaluate the model for advantages and robustness against baseline transformers across various practical tasks, including object classification, image segmentation, and language modeling.
