Narrowing Class-Wise Robustness Gaps in Adversarial Training
Fatemeh Amerehi, Patrick Healy
TL;DR
This paper investigates how adversarial training affects both overall robustness and class-wise performance, revealing trade-offs where higher adversarial robustness can come at the expense of clean accuracy and increased class imbalance. It proposes Label Augmentation (LA), which concatenates original class labels with transformation labels during training, and demonstrates how incorporating LA into adversarial training yields substantial gains in adversarial robustness (up to $53.50\%$ more) while mitigating class-imbalance effects by about $5.73\%$. Through extensive evaluation on ImageNet, ImageNet-C, IN-ReaL, and IN-X with $10$-step PGD training and a perturbation budget of $\varepsilon = 0.03$, the approach achieves a more balanced performance across clean and adversarial settings, albeit with some trade-offs in corruption robustness for certain augmentations. The results highlight a practical, easy-to-implement method to narrow class-wise robustness gaps and enhance robustness without severely compromising accuracy, offering actionable guidance for real-world deployment under distribution shifts.
Abstract
Efforts to address declining accuracy as a result of data shifts often involve various data-augmentation strategies. Adversarial training is one such method, designed to improve robustness to worst-case distribution shifts caused by adversarial examples. While this method can improve robustness, it may also hinder generalization to clean examples and exacerbate performance imbalances across different classes. This paper explores the impact of adversarial training on both overall and class-specific performance, as well as its spill-over effects. We observe that enhanced labeling during training boosts adversarial robustness by 53.50% and mitigates class imbalances by 5.73%, leading to improved accuracy in both clean and adversarial settings compared to standard adversarial training.
