Table of Contents
Fetching ...

Metric Flows with Neural Networks

James Halverson, Fabian Ruehle

TL;DR

A general theory of flows in the space of Riemannian metrics induced by neural network (NN) gradient descent explains why NNs are better at learning Calabi–Yau metrics than fixed kernel methods, such as the Ricci flow.

Abstract

We develop a general theory of flows in the space of Riemannian metrics induced by neural network gradient descent. This is motivated in part by recent advances in approximating Calabi-Yau metrics with neural networks and is enabled by recent advances in understanding flows in the space of neural networks. We derive the corresponding metric flow equations, which are governed by a metric neural tangent kernel, a complicated, non-local object that evolves in time. However, many architectures admit an infinite-width limit in which the kernel becomes fixed and the dynamics simplify. Additional assumptions can induce locality in the flow, which allows for the realization of Perelman's formulation of Ricci flow that was used to resolve the 3d Poincaré conjecture. We demonstrate that such fixed kernel regimes lead to poor learning of numerical Calabi-Yau metrics, as is expected since the associated neural networks do not learn features. Conversely, we demonstrate that well-learned numerical metrics at finite-width exhibit an evolving metric-NTK, associated with feature learning. Our theory of neural network metric flows therefore explains why neural networks are better at learning Calabi-Yau metrics than fixed kernel methods, such as the Ricci flow.

Metric Flows with Neural Networks

TL;DR

A general theory of flows in the space of Riemannian metrics induced by neural network (NN) gradient descent explains why NNs are better at learning Calabi–Yau metrics than fixed kernel methods, such as the Ricci flow.

Abstract

We develop a general theory of flows in the space of Riemannian metrics induced by neural network gradient descent. This is motivated in part by recent advances in approximating Calabi-Yau metrics with neural networks and is enabled by recent advances in understanding flows in the space of neural networks. We derive the corresponding metric flow equations, which are governed by a metric neural tangent kernel, a complicated, non-local object that evolves in time. However, many architectures admit an infinite-width limit in which the kernel becomes fixed and the dynamics simplify. Additional assumptions can induce locality in the flow, which allows for the realization of Perelman's formulation of Ricci flow that was used to resolve the 3d Poincaré conjecture. We demonstrate that such fixed kernel regimes lead to poor learning of numerical Calabi-Yau metrics, as is expected since the associated neural networks do not learn features. Conversely, we demonstrate that well-learned numerical metrics at finite-width exhibit an evolving metric-NTK, associated with feature learning. Our theory of neural network metric flows therefore explains why neural networks are better at learning Calabi-Yau metrics than fixed kernel methods, such as the Ricci flow.
Paper Structure (15 sections, 75 equations, 4 figures, 1 table)

This paper contains 15 sections, 75 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Graphical depiction of the different types of NN metric flows developed in this paper. Perelman's formulation of Ricci flow is realized as an infinite neural network metric flow under architectural assumptions that induce locality. Recent empirical successes in learning Calabi-Yau metrics realize many different flows in the outer ring.
  • Figure 2: Comparison of distance between points with no clustering, batch clustering and full clustering for 1000 points and 10 clusters, using 5 batches for batch clustering.
  • Figure 3: Left: Factor gained in the $\sigma$-loss as a function of average shortest geodesic distance between noised and sampled points. Right: Average, maximum, and minimum geodesic distance as a function of the number of sampled points.
  • Figure 4: Left: Histogram of metric-NTK spectrum at Epochs $0$ and $50$, demonstrating clear distribution shift. Right: Wasserstein distance between metric-NTK spectra at Epoch $0$ and stated epoch, demonstrating evolution during training.