Criticality Leveraged Adversarial Training (CLAT) for Boosted Performance via Parameter Efficiency
Bhavna Gopal, Huanrui Yang, Jingyang Zhang, Mark Horton, Yiran Chen
TL;DR
CLAT tackles robust overfitting in adversarial training by identifying and selectively fine-tuning robustness-critical layers using a Layer Criticality Index. It minimizes layer-specific weakness while freezing non-critical layers, achieving substantial parameter efficiency (≈95% reduction) and improving clean and adversarial accuracy by over 2% across multiple architectures and datasets. The method is compatible with existing AT baselines and includes dynamic reevaluation of critical layers to adapt to training dynamics, with minimal computational overhead. Empirical results on CIFAR-10/100 and varied architectures demonstrate consistent gains in both white-box and black-box robustness, highlighting CLAT's practical impact for efficient, robust learning.
Abstract
Adversarial training enhances neural network robustness but suffers from a tendency to overfit and increased generalization errors on clean data. This work introduces CLAT, an innovative approach that mitigates adversarial overfitting by introducing parameter efficiency into the adversarial training process, improving both clean accuracy and adversarial robustness. Instead of tuning the entire model, CLAT identifies and fine-tunes robustness-critical layers - those predominantly learning non-robust features - while freezing the remaining model to enhance robustness. It employs dynamic critical layer selection to adapt to changes in layer criticality throughout the fine-tuning process. Empirically, CLAT can be applied on top of existing adversarial training methods, significantly reduces the number of trainable parameters by approximately 95%, and achieves more than a 2% improvement in adversarial robustness compared to baseline methods.
