SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

Xixu Hu; Runkai Zheng; Jindong Wang; Cheuk Hang Leung; Qi Wu; Xing Xie

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

Xixu Hu, Runkai Zheng, Jindong Wang, Cheuk Hang Leung, Qi Wu, Xing Xie

TL;DR

This work targets adversarial robustness of Vision Transformers by deriving local Lipschitz bounds for the self-attention mechanism and proposing Maximum Singular Value Penalization (MSVP) to bound these Lipschitz constants. SpecFormer integrates MSVP into attention layers and uses power iteration to efficiently estimate spectral norms, improving stability without sacrificing training efficiency. Extensive experiments on CIFAR-10/100, Imagenette, and ImageNet across multiple ViT variants show state-of-the-art robustness under standard and adversarial training, with notable gains against FGSM, PGD, CW, and AutoAttack while preserving or improving clean accuracy. The approach provides a theoretically grounded, simple, and broadly applicable method for enhancing ViT robustness in practical settings, with code available at the provided repository.

Abstract

Vision Transformers (ViTs) are increasingly used in computer vision due to their high performance, but their vulnerability to adversarial attacks is a concern. Existing methods lack a solid theoretical basis, focusing mainly on empirical training adjustments. This study introduces SpecFormer, tailored to fortify ViTs against adversarial attacks, with theoretical underpinnings. We establish local Lipschitz bounds for the self-attention layer and propose the Maximum Singular Value Penalization (MSVP) to precisely manage these bounds By incorporating MSVP into ViTs' attention layers, we enhance the model's robustness without compromising training efficiency. SpecFormer, the resulting model, outperforms other state-of-the-art models in defending against adversarial attacks, as proven by experiments on CIFAR and ImageNet datasets. Code is released at https://github.com/microsoft/robustlearn.

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

TL;DR

Abstract

Paper Structure (23 sections, 5 theorems, 37 equations, 2 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 5 theorems, 37 equations, 2 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Adversarial Robustness
Robustness of Vision Transformer
Preliminaries
Theoretical Analysis
Re-examining Self-Attention as a Product of Linear Mapping Operations
Comparison with Existing Bounds
SpecFormer
Maximum Singular Value Penalization
Power Iteration
Experiments
Setup
Main Results
Analyzing the efficacy of MSVP
...and 8 more sections

Key Result

theorem thmcountertheorem

Let $f: {\mathcal{X}} \rightarrow \mathbb{R}^m$ be differentiable and locally Lipschitz continuous under a choice of $p$-norm $\|\cdot\|_p$. Let $\mathbf{J}_f(x)$ denote its total derivative (Jacobian) at $\mathbf{x}$. Then, where $\|\mathbf{J}_f(\mathbf{x})\|_p$ is the induced operator norm on $\mathbf{J}_f(\mathbf{x})$.

Figures (2)

Figure 1: SpecFormer with MSVP.
Figure 2: The analysis of MSVP. (a) Maximum singular value comparison between MSVP and the vanilla Transformer. (b)&(c) tSNE van2008visualizing feature visualization.

Theorems & Definitions (9)

definition thmcounterdefinition: Local Lipschitz Continuity
definition thmcounterdefinition: Local Lipschitz Constant
theorem thmcountertheorem: Calculation of Local Lipschitz Constant federer1969geometric
proposition thmcounterproposition: Model Sensitivity and Maximum Singular Value in Linear Modelsyoshida2017spectral
theorem thmcountertheorem
proof
lemma thmcounterlemma: Relationship between the spectral norm of a block row and the block matrix, kim2021lipschitz
theorem thmcountertheorem: Convergence guarantee of the power iteration method mises1929praktische
proof

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

TL;DR

Abstract

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (9)