Table of Contents
Fetching ...

SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers

Bhavna Gopal, Huanrui Yang, Mark Horton, Yiran Chen

TL;DR

SAFER addresses the challenge of adversarial overfitting in vision transformers by identifying and selectively finetuning a small subset of layers that are most prone to overfitting, using sharpness-aware minimization. By freezing the remaining layers, SAFER stabilizes optimization and improves both clean and adversarial accuracy across ViT, DeiT, and Swin architectures, with typical gains around 5% and peaks up to 20%. The method is compatible with PEFT approaches like LoRA and DORA, and demonstrates robust performance under white-box and black-box attacks, including AutoAttack, while incurring minimal overhead. This layer-selective strategy provides a practical path to more robust ViTs in real-world settings without full-model retraining.

Abstract

Vision transformers (ViTs) have become essential backbones in advanced computer vision applications and multi-modal foundation models. Despite their strengths, ViTs remain vulnerable to adversarial perturbations, comparable to or even exceeding the vulnerability of convolutional neural networks (CNNs). Furthermore, the large parameter count and complex architecture of ViTs make them particularly prone to adversarial overfitting, often compromising both clean and adversarial accuracy. This paper mitigates adversarial overfitting in ViTs through a novel, layer-selective fine-tuning approach: SAFER. Instead of optimizing the entire model, we identify and selectively fine-tune a small subset of layers most susceptible to overfitting, applying sharpness-aware minimization to these layers while freezing the rest of the model. Our method consistently enhances both clean and adversarial accuracy over baseline approaches. Typical improvements are around 5%, with some cases achieving gains as high as 20% across various ViT architectures and datasets.

SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers

TL;DR

SAFER addresses the challenge of adversarial overfitting in vision transformers by identifying and selectively finetuning a small subset of layers that are most prone to overfitting, using sharpness-aware minimization. By freezing the remaining layers, SAFER stabilizes optimization and improves both clean and adversarial accuracy across ViT, DeiT, and Swin architectures, with typical gains around 5% and peaks up to 20%. The method is compatible with PEFT approaches like LoRA and DORA, and demonstrates robust performance under white-box and black-box attacks, including AutoAttack, while incurring minimal overhead. This layer-selective strategy provides a practical path to more robust ViTs in real-world settings without full-model retraining.

Abstract

Vision transformers (ViTs) have become essential backbones in advanced computer vision applications and multi-modal foundation models. Despite their strengths, ViTs remain vulnerable to adversarial perturbations, comparable to or even exceeding the vulnerability of convolutional neural networks (CNNs). Furthermore, the large parameter count and complex architecture of ViTs make them particularly prone to adversarial overfitting, often compromising both clean and adversarial accuracy. This paper mitigates adversarial overfitting in ViTs through a novel, layer-selective fine-tuning approach: SAFER. Instead of optimizing the entire model, we identify and selectively fine-tune a small subset of layers most susceptible to overfitting, applying sharpness-aware minimization to these layers while freezing the rest of the model. Our method consistently enhances both clean and adversarial accuracy over baseline approaches. Typical improvements are around 5%, with some cases achieving gains as high as 20% across various ViT architectures and datasets.
Paper Structure (29 sections, 7 equations, 4 figures, 11 tables)

This paper contains 29 sections, 7 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Sharpness value (Equ. (\ref{['equ:layersharpsim']})) measured over layers of an adversarially-trained DeiT-Tiny model (bottom), where the layers with top-2 values are selected for SAFER finetuning. The layers' adversarial loss landscape before adversarial finetuning (left) and after SAFER finetuning (right) are visualized on the top.
  • Figure 2: SAFER performance at different starting points on CIFAR-10 with DeiT-Ti: clean (left) vs. adversarial (right) accuracies.
  • Figure 3: Performance comparison of DeiT-Ti and ViT-S as a function of number of sharp layers selected for SAFER finetuning. The top row shows CIFAR-10 clean and adversarial accuracy, while the bottom row shows Imagenette results. The number "0" on the X axis corresponds to PGD-AT (SAM) without SAFER, where fine-tuning is performed on the entire model. The highest performance points are highlighted in blue for DeiT-Ti and red for ViT-S.
  • Figure 4: SAFER vs. PGD-AT (SAM) performance on CIFAR-10 with DeiT-Ti: clean (left) vs. adversarial (right) accuracies.