Table of Contents
Fetching ...

Understanding the geometry of deep learning with decision boundary volume

Matthew Burfitt, Jacek Brodzki, Pawel Dłotko

Abstract

For classification tasks, the performance of a deep neural network is determined by the structure of its decision boundary, whose geometry directly affects essential properties of the model, including accuracy and robustness. Motivated by a classical tube formula due to Weyl, we introduce a method to measure the decision boundary of a neural network through local surface volumes, providing a theoretically justifiable and efficient measure enabling a geometric interpretation of the effectiveness of the model applicable to the high dimensional feature spaces considered in deep learning. A smaller surface volume is expected to correspond to lower model complexity and better generalisation. We verify, on a number of image processing tasks with convolutional architectures that decision boundary volume is inversely proportional to classification accuracy. Meanwhile, the relationship between local surface volume and generalisation for fully connected architecture is observed to be less stable between tasks. Therefore, for network architectures suited to a particular data structure, we demonstrate that smoother decision boundaries lead to better performance, as our intuition would suggest.

Understanding the geometry of deep learning with decision boundary volume

Abstract

For classification tasks, the performance of a deep neural network is determined by the structure of its decision boundary, whose geometry directly affects essential properties of the model, including accuracy and robustness. Motivated by a classical tube formula due to Weyl, we introduce a method to measure the decision boundary of a neural network through local surface volumes, providing a theoretically justifiable and efficient measure enabling a geometric interpretation of the effectiveness of the model applicable to the high dimensional feature spaces considered in deep learning. A smaller surface volume is expected to correspond to lower model complexity and better generalisation. We verify, on a number of image processing tasks with convolutional architectures that decision boundary volume is inversely proportional to classification accuracy. Meanwhile, the relationship between local surface volume and generalisation for fully connected architecture is observed to be less stable between tasks. Therefore, for network architectures suited to a particular data structure, we demonstrate that smoother decision boundaries lead to better performance, as our intuition would suggest.
Paper Structure (28 sections, 2 theorems, 41 equations, 10 figures, 4 tables)

This paper contains 28 sections, 2 theorems, 41 equations, 10 figures, 4 tables.

Key Result

Proposition 3.1

For $\varepsilon,\theta>0$, a set of $N$ uniform random vectors chosen in the unit ball in $\mathbb{R}^n$ are pairwise $\varepsilon$-orthogonal with probability greater than $\theta$ when

Figures (10)

  • Figure 1: The volumes of a simpler decision boundary (orange) and its tubular neighborhood (yellow) should be smaller than the volumes of a more complex decision boundary and its tubular neighbourhood (purple) when measured in the vicinity of two data manifolds (red and blue).
  • Figure 2: To the left, a demonstration of layer-wise operations in a convolutional neural network taking grayscale images as an input. On the right, a neural network trained to classify two classes of $2$-dimensional points. Class $0$ points are indicated by blue squares and the class $1$ points by orange triangles. The decision boundary is formed by the black line in-between the blue and orange decision regions of the two label classes.
  • Figure 3: Frequency of $\frac{\max_{j}\| x-x_j \|_2-\min_{i,j}\| x-x_j \|_2}{\min_{j}\| x-x_j \|_2}$ values for data points $x \neq x_j$ in the MNIST and CIFAR-10 data sets, respectively. In particular, the width of the distribution for the higher dimensional CIFAR-10 data is smaller than width of the MNIST data distribution.
  • Figure 4: Sample space of $\delta$-neighbourhoods about training points in which to estimate the $\text{\bf TrainBvol}$$\varepsilon$-neighbourhood volume of the network decision boundary.
  • Figure 5: The points sampled on the boundary linearly between data classes in the first step (orange) within a $\delta$ neighbourhood of which the second step estimate of $\text{\bf LAdvBvol}$, the $\varepsilon$-neighbourhood volume of the network decision boundary is made.
  • ...and 5 more figures

Theorems & Definitions (3)

  • Remark 2.1
  • Proposition 3.1: Gorban2016
  • Theorem 5.1