Table of Contents
Fetching ...

Shedding More Light on Robust Classifiers under the lens of Energy-based Models

Mujtaba Hussain Mirza, Maria Rosaria Briglia, Senad Beadini, Iacopo Masi

TL;DR

Weighted Energy Adversarial Training (WEAT), a novel sample weighting scheme that yields robust accuracy matching the state-of-the-art on multiple benchmarks such as CIFAR-10 and SVHN and going beyond in CIFAR-100 and Tiny-ImageNet is proposed.

Abstract

By reinterpreting a robust discriminative classifier as Energy-based Model (EBM), we offer a new take on the dynamics of adversarial training (AT). Our analysis of the energy landscape during AT reveals that untargeted attacks generate adversarial images much more in-distribution (lower energy) than the original data from the point of view of the model. Conversely, we observe the opposite for targeted attacks. On the ground of our thorough analysis, we present new theoretical and practical results that show how interpreting AT energy dynamics unlocks a better understanding: (1) AT dynamic is governed by three phases and robust overfitting occurs in the third phase with a drastic divergence between natural and adversarial energies (2) by rewriting the loss of TRadeoff-inspired Adversarial DEfense via Surrogate-loss minimization (TRADES) in terms of energies, we show that TRADES implicitly alleviates overfitting by means of aligning the natural energy with the adversarial one (3) we empirically show that all recent state-of-the-art robust classifiers are smoothing the energy landscape and we reconcile a variety of studies about understanding AT and weighting the loss function under the umbrella of EBMs. Motivated by rigorous evidence, we propose Weighted Energy Adversarial Training (WEAT), a novel sample weighting scheme that yields robust accuracy matching the state-of-the-art on multiple benchmarks such as CIFAR-10 and SVHN and going beyond in CIFAR-100 and Tiny-ImageNet. We further show that robust classifiers vary in the intensity and quality of their generative capabilities, and offer a simple method to push this capability, reaching a remarkable Inception Score (IS) and FID using a robust classifier without training for generative modeling. The code to reproduce our results is available at http://github.com/OmnAI-Lab/Robust-Classifiers-under-the-lens-of-EBM/ .

Shedding More Light on Robust Classifiers under the lens of Energy-based Models

TL;DR

Weighted Energy Adversarial Training (WEAT), a novel sample weighting scheme that yields robust accuracy matching the state-of-the-art on multiple benchmarks such as CIFAR-10 and SVHN and going beyond in CIFAR-100 and Tiny-ImageNet is proposed.

Abstract

By reinterpreting a robust discriminative classifier as Energy-based Model (EBM), we offer a new take on the dynamics of adversarial training (AT). Our analysis of the energy landscape during AT reveals that untargeted attacks generate adversarial images much more in-distribution (lower energy) than the original data from the point of view of the model. Conversely, we observe the opposite for targeted attacks. On the ground of our thorough analysis, we present new theoretical and practical results that show how interpreting AT energy dynamics unlocks a better understanding: (1) AT dynamic is governed by three phases and robust overfitting occurs in the third phase with a drastic divergence between natural and adversarial energies (2) by rewriting the loss of TRadeoff-inspired Adversarial DEfense via Surrogate-loss minimization (TRADES) in terms of energies, we show that TRADES implicitly alleviates overfitting by means of aligning the natural energy with the adversarial one (3) we empirically show that all recent state-of-the-art robust classifiers are smoothing the energy landscape and we reconcile a variety of studies about understanding AT and weighting the loss function under the umbrella of EBMs. Motivated by rigorous evidence, we propose Weighted Energy Adversarial Training (WEAT), a novel sample weighting scheme that yields robust accuracy matching the state-of-the-art on multiple benchmarks such as CIFAR-10 and SVHN and going beyond in CIFAR-100 and Tiny-ImageNet. We further show that robust classifiers vary in the intensity and quality of their generative capabilities, and offer a simple method to push this capability, reaching a remarkable Inception Score (IS) and FID using a robust classifier without training for generative modeling. The code to reproduce our results is available at http://github.com/OmnAI-Lab/Robust-Classifiers-under-the-lens-of-EBM/ .
Paper Structure (17 sections, 4 theorems, 21 equations, 20 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 4 theorems, 21 equations, 20 figures, 5 tables, 1 algorithm.

Key Result

proposition thmcounterproposition

The KL divergence between two discrete distributions $p(y|\mathbf{x})$ and $p(y|\mathbf{x}^{\star})$ can be interpreted using EBM as Proofs of prop:1 and col:1 are in the sec:trades_proof.:

Figures (20)

  • Figure 1: (a) PGD untargeted attacks create points that heavily bias the energy landscape. Plot shows $E_{\boldsymbol{\theta}}(\mathbf{x})$ in function of PGD steps, across non-robust networks of various depths on CIFAR-10. CIFAR-100 is available in supp. material. (b, c, d)$E_{\boldsymbol{\theta}}(\mathbf{x},y)$ in the function of $E_{\boldsymbol{\theta}}(\mathbf{x})$ for a subset of CIFAR-10 training data at various stages during SAT with PGD 5 iterations. Note that the axes across figures are not in the same range for clarity. The base of each arrow represents the original data point, while the slope of the arrow indicates the loss of the corresponding adversarial sample. The dashed black line corresponds to zero cross-entropy when $E_{\boldsymbol{\theta}}(\mathbf{x},y)=E_{\boldsymbol{\theta}}(\mathbf{x})$ and an arrow parallel to this line indicates an adversarial sample with no loss. Arrows are color-coded by attack strength: for the strongest attacks, for the weakest or negligible attacks, with intermediate colors representing varying intensities.
  • Figure 2: (a) Distributions of the $E_{\boldsymbol{\theta}}(\mathbf{x})$ and (b) the $E_{\boldsymbol{\theta}}(\mathbf{x},y)$ of adversarial and natural inputs for several adversarial perturbations both untargeted and targeted (-T), on CIFAR-10 test set, using a non-robust model. indicates adv. and natural data.
  • Figure 3: Three phases in the energy dynamics while training: overfitting happens in the last, with a steep fall in $\Delta E_{\boldsymbol{\theta}}(\mathbf{x})$ for SAT. For TRADES, it stays almost constant.
  • Figure 4: Difference in the energy between natural data $\mathbf{x}$ and $\mathbf{x}^{\star}$ for state-of-the-art methods in adversarial robustness. For each method we show the signed difference between $\mathbf{x}$ and $\mathbf{x}^{\star}$ for both $E_{\boldsymbol{\theta}}(\mathbf{x})$ and $E_{\boldsymbol{\theta}}(\mathbf{x},y)$, on top of each method we report the robust accuracy from croce2020reliable. The vertical axis is in symmetric log scale. The increase in robust accuracy correlates well with $\Delta E_{\boldsymbol{\theta}}(\mathbf{x})$ approaching zero and reducing the spread of the distribution. + indicates training with generated images by wang2023better, while the + indicates training with additional data by carmon2019unlabeled for the CIFAR-10 dataset.
  • Figure 5: (a)Not perturbing high-energy samples (correctly classified) increases robust error akin to not perturbing incorrectly classified samples shown in wang2019improving. (b) Probabilistic Margins (PMs) in function of $E_{\boldsymbol{\theta}}(\mathbf{x},y)$(c) and of $E_{\boldsymbol{\theta}}(\mathbf{x})$(d) Relationship between error rate, entropy and energy (e) Trend of $E_{\boldsymbol{\theta}}(\mathbf{x},y)$ during the generative steps.
  • ...and 15 more figures

Theorems & Definitions (6)

  • proposition thmcounterproposition
  • corollary thmcountercorollary
  • proposition thmcounterproposition
  • proof
  • corollary thmcountercorollary
  • proof