Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

Amit Daniely; Roy Frostig; Yoram Singer

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

Amit Daniely, Roy Frostig, Yoram Singer

TL;DR

The paper develops a duality between neural networks and compositional kernels by introducing computation skeletons that capture deep architectures. It shows that random initialization positions the network within a rich RKHS corresponding to the skeleton, making last-layer optimization effectively convex and capable of expressing bounded-norm functions in the associated kernel space. The framework explains architectural choices, highlights the favorable properties of ReLU activations, and provides principled insights for initialization and design prior to training. This dual perspective offers a principled route to understand and guide neural network learning, bridging architecture, optimization, and function-space expressivity.

Abstract

We develop a general duality between neural networks and compositional kernels, striving towards a better understanding of deep learning. We show that initial representations generated by common random initializations are sufficiently rich to express all functions in the dual kernel space. Hence, though the training objective is hard to optimize in the worst case, the initial weights form a good starting point for optimization. Our dual view also reveals a pragmatic and aesthetic perspective of neural networks and underscores their expressive power.

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

TL;DR

Abstract

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (33)