Robustness Against Adversarial Attacks via Learning Confined Adversarial Polytopes
Shayan Mohajer Hamidi, Linfeng Ye
TL;DR
The paper addresses the vulnerability of deep neural networks to imperceptible adversarial perturbations by introducing CAP, a training framework that confines each sample’s adversarial polytope $\mathcal{Z}_{\boldsymbol{\epsilon}(\boldsymbol{x}_i)}$ to avoid crossing decision boundaries. CAP uses a two-stage strategy: (i) a particle-based corner-point detection that estimates multiple corners of the polytope and (ii) a center-pushing regularization that minimizes the distance of corner outputs to the polytope center $C_{\boldsymbol{x}_i}$. The training objective combines standard cross-entropy with a center-distance penalty, controlled by $\lambda$, and CAP leverages $N$ particles to better capture polytope geometry. Empirically, CAP outperforms vanilla AT, TRADES, and MART on CIFAR-10/100 and SVHN against strong attacks including AutoAttack, while maintaining competitive clean accuracy, and scales to larger WRN models, highlighting practical robustness gains.
Abstract
Deep neural networks (DNNs) could be deceived by generating human-imperceptible perturbations of clean samples. Therefore, enhancing the robustness of DNNs against adversarial attacks is a crucial task. In this paper, we aim to train robust DNNs by limiting the set of outputs reachable via a norm-bounded perturbation added to a clean sample. We refer to this set as adversarial polytope, and each clean sample has a respective adversarial polytope. Indeed, if the respective polytopes for all the samples are compact such that they do not intersect the decision boundaries of the DNN, then the DNN is robust against adversarial samples. Hence, the inner-working of our algorithm is based on learning \textbf{c}onfined \textbf{a}dversarial \textbf{p}olytopes (CAP). By conducting a thorough set of experiments, we demonstrate the effectiveness of CAP over existing adversarial robustness methods in improving the robustness of models against state-of-the-art attacks including AutoAttack.
