Table of Contents
Fetching ...

Finite-time Lyapunov exponents of deep neural networks

L. Storm, H. Linander, J. Bec, K. Gustavsson, B. Mehlig

TL;DR

It is shown that the maximal exponent forms geometrical structures in input space, akin to coherent structures in dynamical systems, akin to coherent structures in dynamical systems.

Abstract

We compute how small input perturbations affect the output of deep neural networks, exploring an analogy between deep networks and dynamical systems, where the growth or decay of local perturbations is characterised by finite-time Lyapunov exponents. We show that the maximal exponent forms geometrical structures in input space, akin to coherent structures in dynamical systems. Ridges of large positive exponents divide input space into different regions that the network associates with different classes. These ridges visualise the geometry that deep networks construct in input space, shedding light on the fundamental mechanisms underlying their learning capabilities.

Finite-time Lyapunov exponents of deep neural networks

TL;DR

It is shown that the maximal exponent forms geometrical structures in input space, akin to coherent structures in dynamical systems, akin to coherent structures in dynamical systems.

Abstract

We compute how small input perturbations affect the output of deep neural networks, exploring an analogy between deep networks and dynamical systems, where the growth or decay of local perturbations is characterised by finite-time Lyapunov exponents. We show that the maximal exponent forms geometrical structures in input space, akin to coherent structures in dynamical systems. Ridges of large positive exponents divide input space into different regions that the network associates with different classes. These ridges visualise the geometry that deep networks construct in input space, shedding light on the fundamental mechanisms underlying their learning capabilities.
Paper Structure (2 equations, 4 figures)

This paper contains 2 equations, 4 figures.

Figures (4)

  • Figure 1: Classification with a fully connected feed-forward network. ( a) Layout with two input components $x_1^{(0)}$ and $x_2^{(0)}$, $L$ hidden layers with $N=5$ neurons, and one output $x^{(L\!+\!1)}$ for classification. ( b) Two-dimensional input plane (schematic) for a classification problem with a circular decision boundary that separates input patterns with targets $t=+1$ ($\blacksquare$) from those with $t=-1$ ($\Box$, green).
  • Figure 2: Geometrical FTLE structures in input space for different widths $N$ and depths $L$ of fully-connected feed-forward neural networks trained on the data set shown schematically in Fig. \ref{['fig:schematic']}( b). Shown is the colour-coded magnitude of $L \lambda_1^{(L)}(\boldsymbol{x})$, and the maximal stretching directions (black lines).
  • Figure 3: ( a) Classification error for a fully connected feed-forward network with $L=2$ hidden layers with random weights (not trained), and trained output weights, as a function of the number $N$ of hidden neurons per layer (solid black line). Also shown is the classification error for the fully trained network (dashed line). Both curves were obtained for the data set shown schematically in Fig. \ref{['fig:schematic']}( b). ( b) Evolution of maximal-FTLE distribution as a function of training time measured in epochs mehlig2021machine, for a network with $L=8$ hidden layers with $N=50$ neurons per layer. The weights were initialised with different variances, $\log GN\sigma_w^2=-0.2$ (blue), $0$ (green), and $0.2$ (red).
  • Figure 4: Maximal-FTLE field for the MNIST data MNIST. A fully connected feed-forward network with $N=20$ neurons per hidden layer, $L=16$ hidden layers, and a softmax layer with ten outputs was trained to a classification accuracy of 98.88%. The maximal FTLE was calculated for each of the $28^2$-dimensional inputs and projected to two dimensions (see text). ( a) Training data in the non-linear projection. For each input, the maximal FTLE $\lambda_1^{(L)}$ is shown colour-coded (legend). The box contains 93% of the recognised digits $0$. A threefold blow up of this box is also shown. The line represents a sequence of adversarial attacks from $9$ to $4$ (see text), with $\lambda_1^{(L)}(\boldsymbol{x})$ colour-coded. ( b) Classification error on the test set as a function of $\lambda_1^{(L)}(\boldsymbol{x})$. ( c) Predictive uncertainty $H$ (see text) as a function of $\lambda_1^{(L)}(\boldsymbol{x})$.