Generalization Properties of Adversarial Training for $\ell_0$-Bounded Adversarial Attacks

Payam Delgosha; Hamed Hassani; Ramtin Pedarsani

Generalization Properties of Adversarial Training for $\ell_0$-Bounded Adversarial Attacks

Payam Delgosha, Hamed Hassani, Ramtin Pedarsani

TL;DR

A novel generalization bound is proved for the binary classification setting with $\ell_0$-bounded adversarial perturbation that is distribution-independent and develops new coding techniques for bounding the combinatorial dimension of the truncated hypothesis class.

Abstract

We have widely observed that neural networks are vulnerable to small additive perturbations to the input causing misclassification. In this paper, we focus on the $\ell_0$-bounded adversarial attacks, and aim to theoretically characterize the performance of adversarial training for an important class of truncated classifiers. Such classifiers are shown to have strong performance empirically, as well as theoretically in the Gaussian mixture model, in the $\ell_0$-adversarial setting. The main contribution of this paper is to prove a novel generalization bound for the binary classification setting with $\ell_0$-bounded adversarial perturbation that is distribution-independent. Deriving a generalization bound in this setting has two main challenges: (i) the truncated inner product which is highly non-linear; and (ii) maximization over the $\ell_0$ ball due to adversarial training is non-convex and highly non-smooth. To tackle these challenges, we develop new coding techniques for bounding the combinatorial dimension of the truncated hypothesis class.

Generalization Properties of Adversarial Training for $\ell_0$-Bounded Adversarial Attacks

TL;DR

A novel generalization bound is proved for the binary classification setting with

-bounded adversarial perturbation that is distribution-independent and develops new coding techniques for bounding the combinatorial dimension of the truncated hypothesis class.

Abstract

We have widely observed that neural networks are vulnerable to small additive perturbations to the input causing misclassification. In this paper, we focus on the

-bounded adversarial attacks, and aim to theoretically characterize the performance of adversarial training for an important class of truncated classifiers. Such classifiers are shown to have strong performance empirically, as well as theoretically in the Gaussian mixture model, in the

-adversarial setting. The main contribution of this paper is to prove a novel generalization bound for the binary classification setting with

-bounded adversarial perturbation that is distribution-independent. Deriving a generalization bound in this setting has two main challenges: (i) the truncated inner product which is highly non-linear; and (ii) maximization over the

ball due to adversarial training is non-convex and highly non-smooth. To tackle these challenges, we develop new coding techniques for bounding the combinatorial dimension of the truncated hypothesis class.

Paper Structure (17 sections, 13 theorems, 92 equations, 2 figures)

This paper contains 17 sections, 13 theorems, 92 equations, 2 figures.

Introduction
Problem Formulation
Main Results
Bounds on $\Pi_{\mathcal{T}_{d,k}}(n)$
Bounds on $\Pi_{\widetilde{\mathcal{T}}_{d,k}}(n)$
Formal Analysis
A Growth Bound for Truncated Inner Products
A Growth Bound for the Function Class $\widetilde{\mathcal{T}}_{d, k}$
Conclusion
Challenges of Workign with the Truncated Inner Product
Proof of Lemma \ref{['lem:tsum-l0-min-max']}
Proof of Lemma \ref{['lem:trp-sign-code']}
Proof of Proposition \ref{['prop:tip-growth-bound']}
Proof of Lemma \ref{['lem:Aw-large-k-I1-I2']}
Proof of Lemma \ref{['lem:vw-at-most-k-nonzero-trp-zero']}
...and 2 more sections

Key Result

Theorem 1

For any joint distribution $\mathcal{D}$ on the label $y \in \{\pm 1\}$ and feature-vector $\bm{x} \in \mathbb{R}^d$, and any adversarial budget $0< k < d/2$, for $n > d+1$, if $\widehat{\bm{w}}_n$ denotes the model parameters obtained from adversarial training as in eq:hwn-def, with probability at where $c$ is a universal constant.

Figures (2)

Figure 1: Illustration of Lemma \ref{['lem:informal-trp-sign-code']} for $d=4$, $k=1$, $\bm{x} = (1,-1,2,-3)$, and $\bm{w} = (-5,-4,-1,1)$. From $\text{sgn}(\langle \bm{w}, \bm{x} \odot \bm{\beta}_j \rangle)$ for $1 \leq j \leq 6$ on the right, we realize that $w_1 x_1 \leq w_4 x_4 \leq w_3 x_3 \leq w_2 x_2$. This means that $\langle \bm{w}, \bm{x} \rangle_k = w_3 x_3 + w_4 x_4 = \langle \bm{w}, \bm{x} \odot \bm{\alpha}_6 \rangle$ whose sign can be read from the highlighted row on the left table.
Figure 2: $(a)$ Sorted elements in $\bm{u}$ are illustrated on top, and $\bm{u}' \in \mathcal{B}_0(\bm{u},k)$ on the bottom. To minimize $\mathsf{TSum}_k(\bm{u}')$, we need to make the top $k$ elements in $\bm{u}$ (orange block) smaller than $u_{(1)}$ (green block). After truncating the green and blue blocks in $\bm{u}'$, we get $\mathsf{TSum}_k(\bm{u}') = u_{(1)} + \dots + u_{(d-2k)}$. $(b)$ similarly, $u_{(2k+1)} + \dots + u_{(d)}$ is the maximum.

Theorems & Definitions (25)

Definition 1: robust PAC learnability
Theorem 1
Lemma 1: Lemma \ref{['lem:trp-sign-code']} informal
Proposition 1: Proposition \ref{['prop:tip-growth-bound']} informal
Lemma 2
Proposition 2: Proposition \ref{['prop:max-l0-growth-bound']} informal
Proposition 3
Lemma 3
Proposition 4
Lemma 4
...and 15 more

Generalization Properties of Adversarial Training for $\ell_0$-Bounded Adversarial Attacks

TL;DR

Abstract

Generalization Properties of Adversarial Training for $\ell_0$-Bounded Adversarial Attacks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (25)