Hierarchical Randomized Smoothing

Yan Scholten; Jan Schuchardt; Aleksandar Bojchevski; Stephan Günnemann

Hierarchical Randomized Smoothing

Yan Scholten, Jan Schuchardt, Aleksandar Bojchevski, Stephan Günnemann

TL;DR

H hierarchical randomized smoothing is introduced, a powerful framework for making models provably robust against small changes to their inputs by adding random noise only on a randomly selected subset of their entities, yielding stronger robustness guarantees while maintaining high accuracy.

Abstract

Real-world data is complex and often consists of objects that can be decomposed into multiple entities (e.g. images into pixels, graphs into interconnected nodes). Randomized smoothing is a powerful framework for making models provably robust against small changes to their inputs - by guaranteeing robustness of the majority vote when randomly adding noise before classification. Yet, certifying robustness on such complex data via randomized smoothing is challenging when adversaries do not arbitrarily perturb entire objects (e.g. images) but only a subset of their entities (e.g. pixels). As a solution, we introduce hierarchical randomized smoothing: We partially smooth objects by adding random noise only on a randomly selected subset of their entities. By adding noise in a more targeted manner than existing methods we obtain stronger robustness guarantees while maintaining high accuracy. We initialize hierarchical smoothing using different noising distributions, yielding novel robustness certificates for discrete and continuous domains. We experimentally demonstrate the importance of hierarchical smoothing in image and node classification, where it yields superior robustness-accuracy trade-offs. Overall, hierarchical smoothing is an important contribution towards models that are both - certifiably robust to perturbations and accurate.

Hierarchical Randomized Smoothing

TL;DR

Abstract

Paper Structure (29 sections, 22 theorems, 50 equations, 13 figures, 2 algorithms)

This paper contains 29 sections, 22 theorems, 50 equations, 13 figures, 2 algorithms.

Introduction
Related work
Preliminaries and background
Randomized smoothing framework
Neyman-Pearson Lemma: The foundation of randomized smoothing certificates
Existing smoothing distributions for discrete and continuous data
Hierarchical smoothing distribution
Provable robustness certificates for hierarchical randomized smoothing
Point-wise robustness certificates for hierarchical randomized smoothing
Regional robustness certificates for hierarchical randomized smoothing
Initializing hierarchical randomized smoothing
Hierarchical randomized smoothing using Gaussian isotropic smoothing
Hierarchical randomized smoothing using sparse smoothing
Hierarchical randomized smoothing using ablation smoothing
Experimental evaluation
...and 14 more sections

Key Result

Lemma 1

Given $\bm{X},\tilde{\bm{X}}\in\mathcal{X}^{N\times D}$, distributions $\mu_{\bm{X}}, \mu_{\tilde{\bm{X}}}$, class label $y\in\mathcal{Y}$, probability $p_{\bm{X},y}$ and the set $S_\kappa \triangleq \left\{ \bm{W} \in \mathcal{X}^{N\times D} : \mu_{\tilde{\bm{X}}}(\bm{W}) \leq \kappa \cdot \mu_{\bm Proof. See neyman1933ix and cohen2019certified.

Figures (13)

Figure 1: Hierarchical randomized smoothing: We first select a subset of all entities and then add noise to the selected entities only. We achieve stronger robustness guarantees while still maintaining high accuracy -- especially when adversaries can only perturb a subset of all entities. For example in social networks, adversaries typically control only a subset of all nodes in the entire graph.
Figure 2: We derive flexible and efficient robustness certificates for hierarchical randomized smoothing by certifying classifiers on a higher-dimensional space where the indicator $\boldsymbol{\tau}$ is added to the data.
Figure 3: Overview of the disjoint regions and probabilities to sample $\bm{Z}$ of each region. The absence of arrows into specific regions indicates a probability of zero. Only $\bm{Z}$ in region $\mathcal{R}_2$ can be sampled from both distributions $\Psi_{\bm{X}}$ and $\Psi_{\tilde{\bm{X}}}$. The likelihood ratios visualize the proof of \ref{['thm1']}.
Figure 4: Hierarchical smoothing significantly expands the Pareto-front w.r.t. robustness and accuracy in node and image classification. Left: Discrete hierarchical smoothing for node classification, smoothed GAT on Cora-ML ($r\!=\!1, r_d\!=\!40, r_a\!=\!0$). Right: Continuous hierarchical smoothing for image classification, smoothed ResNet50 on CIFAR10 ($r \!=\! 3, \epsilon \!=\! 0.35$). Non-smoothed GAT achieves 80%$\pm2\%$ clean accuracy on Cora-ML, ResNet50 94% on CIFAR10. Large circles and stars are dominating points for each certificate. Dashed lines connect dominating points across methods.
Figure 5: Discrete hierarchical smoothing significantly extends the Pareto-front w.r.t. robustness-accuracy (smoothed GAT on Cora-ML). Left: $r \!=\! 1, r_a=0, r_d = 20$. Right: $r = 1, r_a=0, r_d = 40$.
...and 8 more figures

Theorems & Definitions (34)

Lemma 1: Neyman-Pearson lower bound
Lemma 1: Discrete Neyman-Pearson lower bound
Proposition 1
Proposition 2
Theorem 1: Neyman-Pearson lower bound for hierarchical smoothing
Proposition 1
Corollary 1
Corollary 2
Proposition 1
proof
...and 24 more

Hierarchical Randomized Smoothing

TL;DR

Abstract

Hierarchical Randomized Smoothing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (34)