Table of Contents
Fetching ...

Permutative redundancy and uncertainty of the objective in deep learning

Vacslav Glukhov

TL;DR

It is shown that traditional architectures are polluted by an astronomical number of equivalent global and local optima and some remedies which reduce or eliminate ghost optima are discussed including forced pre-pruning, re-ordering, ortho-polynomial activations, and modular bio-inspired architectures.

Abstract

Implications of uncertain objective functions and permutative symmetry of traditional deep learning architectures are discussed. It is shown that traditional architectures are polluted by an astronomical number of equivalent global and local optima. Uncertainty of the objective makes local optima unattainable, and, as the size of the network grows, the global optimization landscape likely becomes a tangled web of valleys and ridges. Some remedies which reduce or eliminate ghost optima are discussed including forced pre-pruning, re-ordering, ortho-polynomial activations, and modular bio-inspired architectures.

Permutative redundancy and uncertainty of the objective in deep learning

TL;DR

It is shown that traditional architectures are polluted by an astronomical number of equivalent global and local optima and some remedies which reduce or eliminate ghost optima are discussed including forced pre-pruning, re-ordering, ortho-polynomial activations, and modular bio-inspired architectures.

Abstract

Implications of uncertain objective functions and permutative symmetry of traditional deep learning architectures are discussed. It is shown that traditional architectures are polluted by an astronomical number of equivalent global and local optima. Uncertainty of the objective makes local optima unattainable, and, as the size of the network grows, the global optimization landscape likely becomes a tangled web of valleys and ridges. Some remedies which reduce or eliminate ghost optima are discussed including forced pre-pruning, re-ordering, ortho-polynomial activations, and modular bio-inspired architectures.

Paper Structure

This paper contains 19 sections, 1 theorem, 51 equations, 3 figures.

Key Result

Corollary 1

Let $A = (n_0, \dots, n_d)$ be a neural network architecture. Then any optimum of a loss function over the corresponding parameter space has $\|\pi(A)\|$ redundancies - that is, distinct optima that do not correspond to meaningfully distinct parameterizations - in the loss surface.

Figures (3)

  • Figure 1: Invariance of a two-layer network with respect to the permutation of arbitrary two nodes in a layer. $x$'s are the network's inputs, $y$'s the outputs. Two nodes, $h_{1,2}$ and $h_{1,3}$, of a trained network can be exchanged preserving their inputs and outputs without changing the network's output.
  • Figure 2: Polynomial representation of a set of functions $f^m(x)$ as a one-layer network. Binary pre-pruning (see section \ref{['sec:pre-pruning']} ) ensures non-permutability of the output layer: each $f^m(x)$ represents a different combination of features in the data.
  • Figure 3: Four EMNIST object targets (top) passed through the three bio-inspired spatial filters (middle), located in the feature space defined by the filter intensities (arbitrary units) (bottom, colour indicates different targets)

Theorems & Definitions (2)

  • Corollary
  • Conjecture