Robustness Against Adversarial Attacks via Learning Confined Adversarial Polytopes

Shayan Mohajer Hamidi; Linfeng Ye

Robustness Against Adversarial Attacks via Learning Confined Adversarial Polytopes

Shayan Mohajer Hamidi, Linfeng Ye

TL;DR

The paper addresses the vulnerability of deep neural networks to imperceptible adversarial perturbations by introducing CAP, a training framework that confines each sample’s adversarial polytope $\mathcal{Z}_{\boldsymbol{\epsilon}(\boldsymbol{x}_i)}$ to avoid crossing decision boundaries. CAP uses a two-stage strategy: (i) a particle-based corner-point detection that estimates multiple corners of the polytope and (ii) a center-pushing regularization that minimizes the distance of corner outputs to the polytope center $C_{\boldsymbol{x}_i}$. The training objective combines standard cross-entropy with a center-distance penalty, controlled by $\lambda$, and CAP leverages $N$ particles to better capture polytope geometry. Empirically, CAP outperforms vanilla AT, TRADES, and MART on CIFAR-10/100 and SVHN against strong attacks including AutoAttack, while maintaining competitive clean accuracy, and scales to larger WRN models, highlighting practical robustness gains.

Abstract

Deep neural networks (DNNs) could be deceived by generating human-imperceptible perturbations of clean samples. Therefore, enhancing the robustness of DNNs against adversarial attacks is a crucial task. In this paper, we aim to train robust DNNs by limiting the set of outputs reachable via a norm-bounded perturbation added to a clean sample. We refer to this set as adversarial polytope, and each clean sample has a respective adversarial polytope. Indeed, if the respective polytopes for all the samples are compact such that they do not intersect the decision boundaries of the DNN, then the DNN is robust against adversarial samples. Hence, the inner-working of our algorithm is based on learning \textbf{c}onfined \textbf{a}dversarial \textbf{p}olytopes (CAP). By conducting a thorough set of experiments, we demonstrate the effectiveness of CAP over existing adversarial robustness methods in improving the robustness of models against state-of-the-art attacks including AutoAttack.

Robustness Against Adversarial Attacks via Learning Confined Adversarial Polytopes

TL;DR

The paper addresses the vulnerability of deep neural networks to imperceptible adversarial perturbations by introducing CAP, a training framework that confines each sample’s adversarial polytope

to avoid crossing decision boundaries. CAP uses a two-stage strategy: (i) a particle-based corner-point detection that estimates multiple corners of the polytope and (ii) a center-pushing regularization that minimizes the distance of corner outputs to the polytope center

. The training objective combines standard cross-entropy with a center-distance penalty, controlled by

, and CAP leverages

particles to better capture polytope geometry. Empirically, CAP outperforms vanilla AT, TRADES, and MART on CIFAR-10/100 and SVHN against strong attacks including AutoAttack, while maintaining competitive clean accuracy, and scales to larger WRN models, highlighting practical robustness gains.

Abstract

Paper Structure (7 sections, 3 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 7 sections, 3 equations, 1 figure, 2 tables, 1 algorithm.

Introduction
Related work
Methodology
Detecting the corner points of $\mathcal{Z}_{\boldsymbol{\epsilon}(\boldsymbol{x}_i)}$
Pushing the corner points toward the center
Experiments
Conclusion

Figures (1)

Figure 1: The decision boundary learnt by DNN for three classes. (a) A conventionally trained DNN where the adversarial polytopes cross the decision boundary. (b) A DNN trained by CAP, where the corner points of the polytopes were pushed toward its center making the polytopes more compact.

Theorems & Definitions (2)

Remark
Remark

Robustness Against Adversarial Attacks via Learning Confined Adversarial Polytopes

TL;DR

Abstract

Robustness Against Adversarial Attacks via Learning Confined Adversarial Polytopes

Authors

TL;DR

Abstract

Table of Contents

Figures (1)

Theorems & Definitions (2)