Fisher-Rao Metric, Geometry, and Complexity of Neural Networks

Tengyuan Liang; Tomaso Poggio; Alexander Rakhlin; James Stokes

Fisher-Rao Metric, Geometry, and Complexity of Neural Networks

Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, James Stokes

TL;DR

The paper introduces the Fisher-Rao norm as an information-geometric, invariant capacity measure for deep networks and links it to natural gradient and generalization.It provides an analytical FR-norm identity and shows FR serves as an umbrella for existing norm-based capacities, establishing norm-comparison inequalities across several geometries.The authors develop generalization bounds for deep linear and rectified networks via FR-based geometry, and validate theoretical insights with CIFAR-10 experiments demonstrating stable FR behavior under over-parameterization and correlation with generalization gaps.The work offers a unifying geometric perspective on neural network capacity and suggests invariant optimization approaches aligned with the Fisher-Rao geometry.

Abstract

We study the relationship between geometry and capacity measures for deep neural networks from an invariance viewpoint. We introduce a new notion of capacity --- the Fisher-Rao norm --- that possesses desirable invariance properties and is motivated by Information Geometry. We discover an analytical characterization of the new capacity measure, through which we establish norm-comparison inequalities and further show that the new measure serves as an umbrella for several existing norm-based complexity measures. We discuss upper bounds on the generalization error induced by the proposed measure. Extensive numerical experiments on CIFAR-10 support our theoretical findings. Our theoretical analysis rests on a key structural lemma about partial derivatives of multi-layer rectifier networks.

Fisher-Rao Metric, Geometry, and Complexity of Neural Networks

TL;DR

Abstract

Fisher-Rao Metric, Geometry, and Complexity of Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (39)