Mitigating Accuracy-Robustness Trade-off via Balanced Multi-Teacher Adversarial Distillation
Shiji Zhao, Xizhe Wang, Xingxing Wei
TL;DR
The paper tackles the intrinsic accuracy-robustness trade-off in Adversarial Training by introducing Balanced Multi-Teacher Adversarial Distillation (B-MTARD), which leverages a clean teacher for natural inputs and a robust teacher for adversarial inputs. It proposes two novel balancing mechanisms: Entropy-Based Balance, which aligns teachers’ knowledge scales by adjusting temperatures to equalize information entropy, and Normalization Loss Balance, which equalizes the student’s learning speeds from different teachers through adaptive loss weighting. The approach is validated on CIFAR-10/100 and Tiny-ImageNet, achieving state-of-the-art Weighted Robust Accuracy across multiple white-box and black-box attacks and outperforming prior ARD and multi-teacher distillation methods. The results demonstrate the practical viability of decoupling accuracy and robustness and offer a framework potentially extendable to other multi-objective learning problems.
Abstract
Adversarial Training is a practical approach for improving the robustness of deep neural networks against adversarial attacks. Although bringing reliable robustness, the performance towards clean examples is negatively affected after Adversarial Training, which means a trade-off exists between accuracy and robustness. Recently, some studies have tried to use knowledge distillation methods in Adversarial Training, achieving competitive performance in improving the robustness but the accuracy for clean samples is still limited. In this paper, to mitigate the accuracy-robustness trade-off, we introduce the Balanced Multi-Teacher Adversarial Robustness Distillation (B-MTARD) to guide the model's Adversarial Training process by applying a strong clean teacher and a strong robust teacher to handle the clean examples and adversarial examples, respectively. During the optimization process, to ensure that different teachers show similar knowledge scales, we design the Entropy-Based Balance algorithm to adjust the teacher's temperature and keep the teachers' information entropy consistent. Besides, to ensure that the student has a relatively consistent learning speed from multiple teachers, we propose the Normalization Loss Balance algorithm to adjust the learning weights of different types of knowledge. A series of experiments conducted on three public datasets demonstrate that B-MTARD outperforms the state-of-the-art methods against various adversarial attacks.
