Provable Robustness Against a Union of $\ell_0$ Adversarial Attacks

Zayd Hammoudeh; Daniel Lowd

Provable Robustness Against a Union of $\ell_0$ Adversarial Attacks

Zayd Hammoudeh, Daniel Lowd

TL;DR

This paper proposes feature partition aggregation (FPA) -- a certified defense against the union of L0 evasion, backdoor, and poisoning attacks, and generates its stronger robustness guarantees via an ensemble whose submodels are trained on disjoint feature sets.

Abstract

Sparse or $\ell_0$ adversarial attacks arbitrarily perturb an unknown subset of the features. $\ell_0$ robustness analysis is particularly well-suited for heterogeneous (tabular) data where features have different types or scales. State-of-the-art $\ell_0$ certified defenses are based on randomized smoothing and apply to evasion attacks only. This paper proposes feature partition aggregation (FPA) -- a certified defense against the union of $\ell_0$ evasion, backdoor, and poisoning attacks. FPA generates its stronger robustness guarantees via an ensemble whose submodels are trained on disjoint feature sets. Compared to state-of-the-art $\ell_0$ defenses, FPA is up to 3,000${\times}$ faster and provides larger median robustness guarantees (e.g., median certificates of 13 pixels over 10 for CIFAR10, 12 pixels over 10 for MNIST, 4 features over 1 for Weather, and 3 features over 1 for Ames), meaning FPA provides the additional dimensions of robustness essentially for free.

Provable Robustness Against a Union of $\ell_0$ Adversarial Attacks

TL;DR

Abstract

Sparse or

adversarial attacks arbitrarily perturb an unknown subset of the features.

robustness analysis is particularly well-suited for heterogeneous (tabular) data where features have different types or scales. State-of-the-art

certified defenses are based on randomized smoothing and apply to evasion attacks only. This paper proposes feature partition aggregation (FPA) -- a certified defense against the union of

evasion, backdoor, and poisoning attacks. FPA generates its stronger robustness guarantees via an ensemble whose submodels are trained on disjoint feature sets. Compared to state-of-the-art

defenses, FPA is up to 3,000

faster and provides larger median robustness guarantees (e.g., median certificates of 13 pixels over 10 for CIFAR10, 12 pixels over 10 for MNIST, 4 features over 1 for Weather, and 3 features over 1 for Ames), meaning FPA provides the additional dimensions of robustness essentially for free.

Paper Structure (81 sections, 9 theorems, 32 equations, 10 figures, 29 tables, 1 algorithm)

This paper contains 81 sections, 9 theorems, 32 equations, 10 figures, 29 tables, 1 algorithm.

Introduction
Preliminaries
Notation
Threat Model
Our Objective
Related Work
$\ell_{0}$/̄Norm Certified Evasion Defenses
Instance-wise Certified Poisoning Defenses
Certifying Feature Robustness
Feature Robustness Under Plurality Voting
Understanding Theorem \ref{['thm:TheoreticalResults:Top1:Plurality']} More Intuitively
Top/̄$k$ Certified Feature Robustness
Feature Robustness Under Run-Off Elections
Case #1: Overtake $y_{\textnormal{RO}}$ in Round #2
Case #2: Eject $y_{\textnormal{RO}}$ from Round #1's Top-Two Labels
...and 66 more sections

Key Result

Theorem 3

Certified Feature Robustness with Plurality Voting For feature partition ${\mathcal{S}_{1}, \ldots, \mathcal{S}_{T}}$, let $f$ be an ensemble of $T$ submodels using the plurality-voting decision function, where the $t$/̄th submodel uses the features in $\mathcal{S}_{t}$. For instance ${(\mathbf{x},

Figures (10)

Figure 1: Feature partition aggregation example prediction for: test instance ${\mathbf{x} \in \mathcal{X}}$, ${n = 3}$, ${d = 4}$, and ${\lvert\mathcal{Y}\rvert = 3}$. Feature partitioning across ${T = 4}$ submodels, where the $t$/̄th submodel uses only feature dimensions ${\mathcal{S}_{t}= \set{t} \subset {\lbrack4\rbrack}}$ and training set $D_{t}$, i.e., the tuple containing the $t$/̄th column of feature matrix $\mathbf{X}$ (denoted $\mathbf{X}_{t}$) and label vector ${\mathbf{y} \coloneqq \lbrack y_{1}, y_{2}, y_{3}\rbrack}$. $\mathbf{x}_{\mathcal{S}_{t}}$ denotes the subvector of $\mathbf{x}$ restricted to the feature dimensions in $\mathcal{S}_{t}$. Plurality label ${y_{\textnormal{pl}} = 0}$; runner-up label ${y_{\textnormal{ru}} = 1}$; and the predicted label with the run-off decision function is ${y_{\textnormal{RO}} = 0}$. Under the plurality voting decision function (Sec. \ref{['sec:TheoreticalResults:Plural']}), ${f(\mathbf{x} )}$ has certified feature robustness ${r_{\textnormal{pl}} = 0}$. With the run-off decision function (Sec. \ref{['sec:TheoreticalResults:RunOff']}), ${f(\mathbf{x} )}$'s certified feature robustness is ${r_{\textnormal{RO}} = 1}$.
Figure 4: Classification certified accuracy envelope for datasets CIFAR10 (${d = 1024}$) and MNIST (${d = 784}$) for feature partition aggregation (FPA) and baseline randomized ablation (RA). Each method's envelope considers the corresponding hyperparameters in Tables \ref{['tab:App:MoreExps:Combined:Numerical:CIFAR10']} and \ref{['tab:App:MoreExps:Combined:Numerical:MNIST']}, emulating a certified defense where the hyperparameters are roughly tuned to maximize the certified accuracy at each robustness level. Subfigures \ref{['fig:App:MoreExps:Combined:Graphical:Classification:MaxPlot:CIFAR10:Top1']} and \ref{['fig:App:MoreExps:Combined:Graphical:Classification:MaxPlot:MNIST:Top1']} visualize each method's certified accuracy envelope (larger is better); also shown in these subfigures is a naive baseline where the decision function always predicts label ${{f(\mathbf{x} )}= 1}$. Subfigures \ref{['fig:App:MoreExps:Combined:Graphical:Classification:DiffPlot:CIFAR10:Top1']} and \ref{['fig:App:MoreExps:Combined:Graphical:Classification:DiffPlot:MNIST:Top1']} visualize the improvement in certified accuracy when using FPA with the run-off decision function over the two randomized ablation baselines from Levine:2020:RandomizedAblation and Jia:2022:AlmostTightL0. FPA with run-off's certified accuracy advantage over Jia:2022:AlmostTightL0's version of RA was as large as 6.54pp and 12.74pp for CIFAR10 and MNIST, respectively. FPA's performance advantage was even larger over Levine:2020:RandomizedAblation's Levine:2020:RandomizedAblation version of RA. The envelope plots' underlying numerical values are provided in Table \ref{['tab:App:MoreExps:Combined:Numerical:CIFAR10']} for CIFAR10 and Table \ref{['tab:App:MoreExps:Combined:Numerical:MNIST']} for MNIST.
Figure 5: Regression certified accuracy envelope for the Weather Malinin:2021:Shifts (${d = 128}$) and Ames DeCock:2011:AmesHousing (${d = 352}$) datasets for feature partition aggregation (FPA) and baseline randomized ablation (RA). Each method's envelope considers the corresponding hyperparameters in Tables \ref{['tab:App:MoreExps:Combined:Numerical:Weather']} and \ref{['tab:App:MoreExps:Combined:Numerical:Ames']}, emulating a certified defense where the hyperparameters are tuned to maximize each robustness level's certified accuracy. Subfigures \ref{['fig:App:MoreExps:Combined:Graphical:Regression:MaxPlot:Weather']} and \ref{['fig:App:MoreExps:Combined:Graphical:Regression:MaxPlot:Ames']} visualize each method's certified accuracy envelope (larger is better); also shown in these subfigures is a naive baseline that always predicts the median training data target value. Subfigures \ref{['fig:App:MoreExps:Combined:Graphical:Regression:DiffPlot:Weather']} and \ref{['fig:App:MoreExps:Combined:Graphical:Regression:DiffPlot:Ames']} visualize the improvement in certified accuracy when using FPA (with plurality voting) as the decision function over the two randomized ablation baselines from Levine:2020:RandomizedAblation and Jia:2022:AlmostTightL0. FPA with run-off's certified accuracy advantage over Jia:2022:AlmostTightL0's version of RA was as large as 21.9pp and 17.4pp for Weather and Ames, respectively. FPA's performance advantage was even larger over Levine:2020:RandomizedAblation's Levine:2020:RandomizedAblation version of RA. FPA outperforms randomized ablation for smaller certified robustness values, while Jia:2022:AlmostTightL0's Jia:2022:AlmostTightL0 version of RA marginally outperformed both FPA and the naive baseline at larger robustness values. The envelope plots' underlying numerical values are provided in Table \ref{['tab:App:MoreExps:Combined:Numerical:Weather']} for Weather and Table \ref{['tab:App:MoreExps:Combined:Numerical:Ames']} for Ames.
Figure 6: Effect of Submodel Count $T$ on the Certified Feature Robustness: Mean certified accuracy (%) for our sparse defense, feature partition aggregation (FPA), across different submodel counts ($T$). The non-robust accuracy ([1]\ref{['leg:ExpRes:Bound:UncertifiedLine']}) visualizes the classification accuracy of a single model (${T = 1}$) trained on all features; these single model prediction results are provided only for reference. For all four datasets, increasing $T$ decreases the classification accuracy but increases the maximum certifiable robustness.
Figure 7: Effect of the Number of Kept Features ($e$) on RA's Certified $\ell_{0}$/̄Norm Robustness: Mean certified accuracy (%) for baseline randomized ablation across different quantities of kept pixels ($e$). Non-robust accuracy ([1]\ref{['leg:ExpRes:Bound:UncertifiedLine']}) visualizes the peak accuracy of a single model (${T = 1}$) trained on all features; these single model predictions are provided only for reference.
...and 5 more figures

Theorems & Definitions (18)

Theorem 3
Theorem 4
proof
Lemma 5
proof
Lemma 6
proof
Lemma 7
proof
proof
...and 8 more

Provable Robustness Against a Union of $\ell_0$ Adversarial Attacks

TL;DR

Abstract

Provable Robustness Against a Union of $\ell_0$ Adversarial Attacks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (18)