Table of Contents
Fetching ...

MEAT: Median-Ensemble Adversarial Training for Improving Robustness and Generalization

Zhaozhe Hu, Jia-Li Yin, Bin Chen, Luojun Lin, Bo-Hao Chen, Ximeng Liu

TL;DR

This work proposes an easy-to-operate and effective Median-Ensemble Adversarial Training (MEAT) method to solve the robust overfitting phenomenon existing in self-ensemble defense from the source by searching for the median of the historical model weights.

Abstract

Self-ensemble adversarial training methods improve model robustness by ensembling models at different training epochs, such as model weight averaging (WA). However, previous research has shown that self-ensemble defense methods in adversarial training (AT) still suffer from robust overfitting, which severely affects the generalization performance. Empirically, in the late phases of training, the AT becomes more overfitting to the extent that the individuals for weight averaging also suffer from overfitting and produce anomalous weight values, which causes the self-ensemble model to continue to undergo robust overfitting due to the failure in removing the weight anomalies. To solve this problem, we aim to tackle the influence of outliers in the weight space in this work and propose an easy-to-operate and effective Median-Ensemble Adversarial Training (MEAT) method to solve the robust overfitting phenomenon existing in self-ensemble defense from the source by searching for the median of the historical model weights. Experimental results show that MEAT achieves the best robustness against the powerful AutoAttack and can effectively allievate the robust overfitting. We further demonstrate that most defense methods can improve robust generalization and robustness by combining with MEAT.

MEAT: Median-Ensemble Adversarial Training for Improving Robustness and Generalization

TL;DR

This work proposes an easy-to-operate and effective Median-Ensemble Adversarial Training (MEAT) method to solve the robust overfitting phenomenon existing in self-ensemble defense from the source by searching for the median of the historical model weights.

Abstract

Self-ensemble adversarial training methods improve model robustness by ensembling models at different training epochs, such as model weight averaging (WA). However, previous research has shown that self-ensemble defense methods in adversarial training (AT) still suffer from robust overfitting, which severely affects the generalization performance. Empirically, in the late phases of training, the AT becomes more overfitting to the extent that the individuals for weight averaging also suffer from overfitting and produce anomalous weight values, which causes the self-ensemble model to continue to undergo robust overfitting due to the failure in removing the weight anomalies. To solve this problem, we aim to tackle the influence of outliers in the weight space in this work and propose an easy-to-operate and effective Median-Ensemble Adversarial Training (MEAT) method to solve the robust overfitting phenomenon existing in self-ensemble defense from the source by searching for the median of the historical model weights. Experimental results show that MEAT achieves the best robustness against the powerful AutoAttack and can effectively allievate the robust overfitting. We further demonstrate that most defense methods can improve robust generalization and robustness by combining with MEAT.
Paper Structure (11 sections, 4 equations, 2 figures, 2 tables)

This paper contains 11 sections, 4 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The training curve (a) of robust accuracy (%) against PGD-20 attack using standard AT w./w.o. WA and MEAT and the distribution of weight values (b) in the last convolutional layer. The learning rate drops at 60 epochs. Compared to standard AT and WA, MEAT effectively mitigates robust overfitting while maintaining high robust accuracy.
  • Figure 2: Comparison of the adversarial loss landscape of models trained by standard AT (a) and MEAT (b) using WRN-34-10 on CIFAR-10. $z$ axis denotes the loss value. We plot the loss landscape function: $z=\ell(\theta+\frac{v_1}{\lVert v_1 \rVert}\lVert \theta \rVert + \frac{v_2}{\lVert v_2 \rVert}\lVert \theta \rVert)$, where $v_1$ and $v_2$ denote two random vectors sampled from a Gaussian distribution, and $\lVert \cdot \rVert$ denotes the Frobenius norm.