Table of Contents
Fetching ...

Robustness of Deep ReLU Networks to Misclassification of High-Dimensional Data

Věra Kůrková

TL;DR

Local robustness at a given network input is analyzed by quantifying the probability that a small additive random perturbation of the input leads to misclassification by derived lower bounds on local robustness in terms of the input dimensionality and the total number of network units.

Abstract

We present a theoretical study of the robustness of parameterized networks to random input perturbations. Specifically, we analyze local robustness at a given network input by quantifying the probability that a small additive random perturbation of the input leads to misclassification. For deep networks with rectified linear units, we derive lower bounds on local robustness in terms of the input dimensionality and the total number of network units.

Robustness of Deep ReLU Networks to Misclassification of High-Dimensional Data

TL;DR

Local robustness at a given network input is analyzed by quantifying the probability that a small additive random perturbation of the input leads to misclassification by derived lower bounds on local robustness in terms of the input dimensionality and the total number of network units.

Abstract

We present a theoretical study of the robustness of parameterized networks to random input perturbations. Specifically, we analyze local robustness at a given network input by quantifying the probability that a small additive random perturbation of the input leads to misclassification. For deep networks with rectified linear units, we derive lower bounds on local robustness in terms of the input dimensionality and the total number of network units.
Paper Structure (5 sections, 4 theorems, 47 equations, 4 figures)

This paper contains 5 sections, 4 theorems, 47 equations, 4 figures.

Key Result

Theorem 3.1

Let $f: {\mathbb R}^d \to \{0,1\}$, defined as $f(x)=\vartheta(v \cdot x +b)$, be an I/O function of a perceptron with the Heaviside activation $\vartheta$, $r>0$ be a size of a perturbation. Then for every $x \in{\mathbb R}^d$, $a(x) = \| p(x) - x\|_2$, where $p(x)$ is the orthogonal projection of (ii) for $a(x)=0$, (iii) for $r < a(x)$,

Figures (4)

  • Figure 1: Spherical polar cap.
  • Figure 2: Area of perturbed patterns separated from $x$ by a hyperplane.
  • Figure 3: Robustness of a shallow Heaviside perceptron network
  • Figure 4: Probability of a misclassification by a deep ReLU network

Theorems & Definitions (4)

  • Theorem 3.1
  • Corollary 3.2
  • Theorem 4.1
  • Theorem 4.2