Table of Contents
Fetching ...

Criticality Leveraged Adversarial Training (CLAT) for Boosted Performance via Parameter Efficiency

Bhavna Gopal, Huanrui Yang, Jingyang Zhang, Mark Horton, Yiran Chen

TL;DR

CLAT tackles robust overfitting in adversarial training by identifying and selectively fine-tuning robustness-critical layers using a Layer Criticality Index. It minimizes layer-specific weakness while freezing non-critical layers, achieving substantial parameter efficiency (≈95% reduction) and improving clean and adversarial accuracy by over 2% across multiple architectures and datasets. The method is compatible with existing AT baselines and includes dynamic reevaluation of critical layers to adapt to training dynamics, with minimal computational overhead. Empirical results on CIFAR-10/100 and varied architectures demonstrate consistent gains in both white-box and black-box robustness, highlighting CLAT's practical impact for efficient, robust learning.

Abstract

Adversarial training enhances neural network robustness but suffers from a tendency to overfit and increased generalization errors on clean data. This work introduces CLAT, an innovative approach that mitigates adversarial overfitting by introducing parameter efficiency into the adversarial training process, improving both clean accuracy and adversarial robustness. Instead of tuning the entire model, CLAT identifies and fine-tunes robustness-critical layers - those predominantly learning non-robust features - while freezing the remaining model to enhance robustness. It employs dynamic critical layer selection to adapt to changes in layer criticality throughout the fine-tuning process. Empirically, CLAT can be applied on top of existing adversarial training methods, significantly reduces the number of trainable parameters by approximately 95%, and achieves more than a 2% improvement in adversarial robustness compared to baseline methods.

Criticality Leveraged Adversarial Training (CLAT) for Boosted Performance via Parameter Efficiency

TL;DR

CLAT tackles robust overfitting in adversarial training by identifying and selectively fine-tuning robustness-critical layers using a Layer Criticality Index. It minimizes layer-specific weakness while freezing non-critical layers, achieving substantial parameter efficiency (≈95% reduction) and improving clean and adversarial accuracy by over 2% across multiple architectures and datasets. The method is compatible with existing AT baselines and includes dynamic reevaluation of critical layers to adapt to training dynamics, with minimal computational overhead. Empirical results on CIFAR-10/100 and varied architectures demonstrate consistent gains in both white-box and black-box robustness, highlighting CLAT's practical impact for efficient, robust learning.

Abstract

Adversarial training enhances neural network robustness but suffers from a tendency to overfit and increased generalization errors on clean data. This work introduces CLAT, an innovative approach that mitigates adversarial overfitting by introducing parameter efficiency into the adversarial training process, improving both clean accuracy and adversarial robustness. Instead of tuning the entire model, CLAT identifies and fine-tunes robustness-critical layers - those predominantly learning non-robust features - while freezing the remaining model to enhance robustness. It employs dynamic critical layer selection to adapt to changes in layer criticality throughout the fine-tuning process. Empirically, CLAT can be applied on top of existing adversarial training methods, significantly reduces the number of trainable parameters by approximately 95%, and achieves more than a 2% improvement in adversarial robustness compared to baseline methods.
Paper Structure (42 sections, 1 theorem, 10 equations, 7 figures, 20 tables, 1 algorithm)

This paper contains 42 sections, 1 theorem, 10 equations, 7 figures, 20 tables, 1 algorithm.

Key Result

Proposition 3.2

Critical layers defined as in def:critical can be identified as the layers with the largest criticality indices $\arg\max_i \mathcal{C}_{f_i}$.

Figures (7)

  • Figure 1: CLAT overview. CLAT fine-tunes the selected critical layers (red) while freezing other layers (grey). fine-tuning objective is computed per \ref{['equ:overall_objective']}. Critical layers are adjusted periodically. Pseudocode is provided in \ref{['ap:code']}.
  • Figure 2: Comparative analysis of CLAT performance on WRN34-10: Clean and adversarial accuracy on CIFAR-10 across partially trained models.
  • Figure 3: Comparative analysis on CLAT performance/PGD-10 adversarial accuracy with respect to number of critical layers used during CLAT
  • Figure 4: White-box adversarial accuracy (y-axis) on CIFAR-10 for models trained with CLAT (red) and pgd-at (blue), against PGD attacks of varying strengths (x-axis)
  • Figure 5: White-box PGD-10 adversarial accuracy (y-axis) on CIFAR-10 for WRN34-10 models trained with CLAT fine-tuning starting at Epoch 70 (red), CLAT from scratch (orange), and PGD-AT (blue). The learning rate decays to 0 by Epoch 150.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition 3.1
  • Proposition 3.2