Table of Contents
Fetching ...

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

Amit Daniely, Roy Frostig, Yoram Singer

TL;DR

The paper develops a duality between neural networks and compositional kernels by introducing computation skeletons that capture deep architectures. It shows that random initialization positions the network within a rich RKHS corresponding to the skeleton, making last-layer optimization effectively convex and capable of expressing bounded-norm functions in the associated kernel space. The framework explains architectural choices, highlights the favorable properties of ReLU activations, and provides principled insights for initialization and design prior to training. This dual perspective offers a principled route to understand and guide neural network learning, bridging architecture, optimization, and function-space expressivity.

Abstract

We develop a general duality between neural networks and compositional kernels, striving towards a better understanding of deep learning. We show that initial representations generated by common random initializations are sufficiently rich to express all functions in the dual kernel space. Hence, though the training objective is hard to optimize in the worst case, the initial weights form a good starting point for optimization. Our dual view also reveals a pragmatic and aesthetic perspective of neural networks and underscores their expressive power.

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

TL;DR

The paper develops a duality between neural networks and compositional kernels by introducing computation skeletons that capture deep architectures. It shows that random initialization positions the network within a rich RKHS corresponding to the skeleton, making last-layer optimization effectively convex and capable of expressing bounded-norm functions in the associated kernel space. The framework explains architectural choices, highlights the favorable properties of ReLU activations, and provides principled insights for initialization and design prior to training. This dual perspective offers a principled route to understand and guide neural network learning, bridging architecture, optimization, and function-space expressivity.

Abstract

We develop a general duality between neural networks and compositional kernels, striving towards a better understanding of deep learning. We show that initial representations generated by common random initializations are sufficiently rich to express all functions in the dual kernel space. Hence, though the training objective is hard to optimize in the worst case, the initial weights form a good starting point for optimization. Our dual view also reveals a pragmatic and aesthetic perspective of neural networks and underscores their expressive power.

Paper Structure

This paper contains 30 sections, 18 theorems, 90 equations, 2 figures, 1 table.

Key Result

Theorem 2

Let ${\cal S}$ be a skeleton with $C$-bounded activations. Let ${\mathbf w}$ be a random initialization of ${\cal N}={\cal N}({\cal S},r)$ with Then, for all ${\mathbf x},{\mathbf x}'$, with probability of at least $1-\delta$,

Figures (2)

  • Figure 1: Examples of computation skeletons.
  • Figure 2: A $(5,4)$-fold and $5$-fold realizations of the computation skeleton ${\cal S}$ with $d=2$.

Theorems & Definitions (33)

  • Definition 1
  • Definition 2: Realization of a skeleton
  • Definition 3: Random weights
  • Definition 4: Dual activation and kernel
  • Definition 5: Compositional kernels
  • Example 1: Convolutional vs. fully connected skeletons
  • Definition 6
  • Theorem 2
  • Theorem 3
  • Definition 7
  • ...and 23 more