Lower Difficulty and Better Robustness: A Bregman Divergence Perspective for Adversarial Training

Zihui Wu; Haichang Gao; Bingqian Zhou; Xiaoyan Guo; Shudong Zhang

Lower Difficulty and Better Robustness: A Bregman Divergence Perspective for Adversarial Training

Zihui Wu, Haichang Gao, Bingqian Zhou, Xiaoyan Guo, Shudong Zhang

TL;DR

The paper tackles adversarial robustness in adversarial training (AT) by identifying optimization difficulty as a key bottleneck and introducing a Bregman-divergence perspective that links AT losses to $KL$-divergence and entropy-based geometry. It shows that TRADES, by separating accuracy and robustness objectives, is easier to optimize than PGD-AT, and proposes two methods, FAIT and MER, to further ease optimization while boosting robustness under $10$-step PGD and AutoAttack. FAIT introduces an interpolated PGD path to decouple robustness losses; MER maximizes output entropy to make robustness learning easier. Experiments across CIFAR-10/100 with multiple architectures and alternative distances demonstrate improved robustness, scalability, and generality, providing concrete design guidelines for robust AT methods.

Abstract

In this paper, we investigate on improving the adversarial robustness obtained in adversarial training (AT) via reducing the difficulty of optimization. To better study this problem, we build a novel Bregman divergence perspective for AT, in which AT can be viewed as the sliding process of the training data points on the negative entropy curve. Based on this perspective, we analyze the learning objectives of two typical AT methods, i.e., PGD-AT and TRADES, and we find that the optimization process of TRADES is easier than PGD-AT for that TRADES separates PGD-AT. In addition, we discuss the function of entropy in TRADES, and we find that models with high entropy can be better robustness learners. Inspired by the above findings, we propose two methods, i.e., FAIT and MER, which can both not only reduce the difficulty of optimization under the 10-step PGD adversaries, but also provide better robustness. Our work suggests that reducing the difficulty of optimization under the 10-step PGD adversaries is a promising approach for enhancing the adversarial robustness in AT.

Lower Difficulty and Better Robustness: A Bregman Divergence Perspective for Adversarial Training

TL;DR

-divergence and entropy-based geometry. It shows that TRADES, by separating accuracy and robustness objectives, is easier to optimize than PGD-AT, and proposes two methods, FAIT and MER, to further ease optimization while boosting robustness under

-step PGD and AutoAttack. FAIT introduces an interpolated PGD path to decouple robustness losses; MER maximizes output entropy to make robustness learning easier. Experiments across CIFAR-10/100 with multiple architectures and alternative distances demonstrate improved robustness, scalability, and generality, providing concrete design guidelines for robust AT methods.

Abstract

Paper Structure (25 sections, 6 theorems, 27 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 25 sections, 6 theorems, 27 equations, 4 figures, 6 tables, 1 algorithm.

Introduction
Notations
Related work
A Bregman divergence perspective for AT
Relationship between AT and Bregman divergence
KL-divergence equivalent form.
Bregman divergence.
Binary classification analyses
Guideline 1: It is better to separate than to merge.
Guideline 2: High-entropy models are better robustness learners.
Remark 1.
Method
FAIT
MER
Experiments
...and 10 more sections

Key Result

Lemma 1

Given points $x_1$, $x_2$ and $x_*$, if $\exists \alpha \in [0,1]$ such that $p_\theta(x_*) = (1-\alpha) p_\theta(x_1) + \alpha p_\theta(x_2)$, then the following inequality holds true:

Figures (4)

Figure 1: Both of the proposed MER and FAIT can help mitigate the robustness-accuracy tradeoff, and provide better robustness than previous baseline TRADES under the AutoAttack (AA) cite18 on CIFAR-10 with ResNet-18. $\lambda$ is a hyperparameter that balances the tradeoff, and the best robustness $\lambda$ is in the star marker.
Figure 2: Illustrations of the Bregman divergence perspective of PGD-AT and TRADES.
Figure 3: Illustrations of $R_\theta$ in models $f_{\theta_{1}}$ and $f_{\theta_{2}}$ when $\mathcal{H}(f_{\theta_{1}},\epsilon) \leq \mathcal{H}(f_{\theta_{2}},\epsilon)$ at the three different conditions.
Figure 4: Results of $\mathcal{R}_\theta$ and $\mathcal{R}_\theta^\prime$ during the training process of TRADES and TRADES-MER with the same $\lambda=9$.

Theorems & Definitions (11)

Lemma 1
Definition 3.1: Entropy upper bound
Definition 3.2: Identical adv-convergence
Theorem 1: $\mathcal{C}.1$
Theorem 2: $\mathcal{C}.2 , \mathcal{C}.3$
Lemma 2
proof : Proof of Lemma \ref{['Alemma:1']}
Theorem 3: $\mathcal{C}.1$
Theorem 4: $\mathcal{C}.2 , \mathcal{C}.3$
proof : Proof of Theorem \ref{['theory:1']}
...and 1 more

Lower Difficulty and Better Robustness: A Bregman Divergence Perspective for Adversarial Training

TL;DR

Abstract

Lower Difficulty and Better Robustness: A Bregman Divergence Perspective for Adversarial Training

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (11)