Table of Contents
Fetching ...

Intriguing Properties of Input-dependent Randomized Smoothing

Peter Súkeník, Aleksei Kuvshinov, Stephan Günnemann

TL;DR

It is shown that in general, the input-dependent smoothing suffers from the curse of dimensionality, forcing the variance function to have low semi-elasticity.

Abstract

Randomized smoothing is currently considered the state-of-the-art method to obtain certifiably robust classifiers. Despite its remarkable performance, the method is associated with various serious problems such as "certified accuracy waterfalls", certification vs.\ accuracy trade-off, or even fairness issues. Input-dependent smoothing approaches have been proposed with intention of overcoming these flaws. However, we demonstrate that these methods lack formal guarantees and so the resulting certificates are not justified. We show that in general, the input-dependent smoothing suffers from the curse of dimensionality, forcing the variance function to have low semi-elasticity. On the other hand, we provide a theoretical and practical framework that enables the usage of input-dependent smoothing even in the presence of the curse of dimensionality, under strict restrictions. We present one concrete design of the smoothing variance function and test it on CIFAR10 and MNIST. Our design mitigates some of the problems of classical smoothing and is formally underlined, yet further improvement of the design is still necessary.

Intriguing Properties of Input-dependent Randomized Smoothing

TL;DR

It is shown that in general, the input-dependent smoothing suffers from the curse of dimensionality, forcing the variance function to have low semi-elasticity.

Abstract

Randomized smoothing is currently considered the state-of-the-art method to obtain certifiably robust classifiers. Despite its remarkable performance, the method is associated with various serious problems such as "certified accuracy waterfalls", certification vs.\ accuracy trade-off, or even fairness issues. Input-dependent smoothing approaches have been proposed with intention of overcoming these flaws. However, we demonstrate that these methods lack formal guarantees and so the resulting certificates are not justified. We show that in general, the input-dependent smoothing suffers from the curse of dimensionality, forcing the variance function to have low semi-elasticity. On the other hand, we provide a theoretical and practical framework that enables the usage of input-dependent smoothing even in the presence of the curse of dimensionality, under strict restrictions. We present one concrete design of the smoothing variance function and test it on CIFAR10 and MNIST. Our design mitigates some of the problems of classical smoothing and is formally underlined, yet further improvement of the design is still necessary.

Paper Structure

This paper contains 35 sections, 22 theorems, 80 equations, 24 figures, 20 tables, 3 algorithms.

Key Result

Lemma 2.1

Out of all possible classifiers $f$ such that ${G_f(x_0)}_B \le p_B = 1-p_A$, the one, for which ${G_{f}(x_0+\delta)}_B$ is maximized predicts class $B$ in a region determined by the likelihood ratio: where $r$ is fixed, such that $\mathbb{P}_0(B)=p_B$. Note that we use $B$ to denote both the class and the region of that class.

Figures (24)

  • Figure 1: Motivating toy experiment. We use constant $\sigma=0.6$ and input-dependent $\sigma(x)$ equal in average to the constant $\sigma$. Left: Dataset and the variance function depicted as circles with the radius equal to $\sigma(x)$ and centers at the data points. Middle: Zoomed in part of the dataset and decision boundaries of the smoothed classifiers with constant $\sigma$ (red) and input-dependent $\sigma(x)$ (green). Note that we recover a part of the misclassified data points by using a more appropriate smoothing strength close to the decision boundary. Right: Certified accuracy plot. The waterfall effect vanishes since the points far from the decision boundary are certified with a correspondingly large $\sigma(x)$.
  • Figure 2: Decision regions of the worst-case classifier $f^*$. Left:$\sigma_0>\sigma_1$Right:$\sigma_0<\sigma_1$.
  • Figure 3: Plots depicting tightness of results of Theorem \ref{['main theorem']}. In both figures, the biggest possible threshold of $\sigma_1/\sigma_0$ for which the condition in Theorem \ref{['main theorem']} is satisfied (theoretical threshold) and the numerically computed threshold for which $\xi_>(0)$ exceeds the threshold 0.5 (practical threshold) are depicted. Left: Plot for $p_A=0.9$, Right: Plot for $p_A=0.999$.
  • Figure 4: Comparison of certified accuracy plots for cohen2019certified's method and our work.
  • Figure 5: The toy dataset.
  • ...and 19 more figures

Theorems & Definitions (45)

  • Lemma 2.1
  • Theorem 2.2
  • Theorem 2.3
  • Theorem 2.4: curse of dimensionality
  • Corollary 2.5: one-sided simpler bound
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 4.1
  • Theorem 1.1
  • proof
  • ...and 35 more