Table of Contents
Fetching ...

Function approximation by neural nets in the mean-field regime: Entropic regularization and controlled McKean-Vlasov dynamics

Belinda Tzen, Maxim Raginsky

TL;DR

The viability of the mean-field Langevin diffusion as a finite-time approximation under various conditions on entropic regularization is illustrated, and it is shown that it closely tracks the F\"ollmer drift when the regularization is such that the minimizing density is log-concave.

Abstract

We consider the problem of function approximation by two-layer neural nets with random weights that are "nearly Gaussian" in the sense of Kullback-Leibler divergence. Our setting is the mean-field limit, where the finite population of neurons in the hidden layer is replaced by a continuous ensemble. We show that the problem can be phrased as global minimization of a free energy functional on the space of (finite-length) paths over probability measures on the weights. This functional trades off the $L^2$ approximation risk of the terminal measure against the KL divergence of the path with respect to an isotropic Brownian motion prior. We characterize the unique global minimizer and examine the dynamics in the space of probability measures over weights that can achieve it. In particular, we show that the optimal path-space measure corresponds to the Föllmer drift, the solution to a McKean-Vlasov optimal control problem closely related to the classic Schrödinger bridge problem. While the Föllmer drift cannot in general be obtained in closed form, thus limiting its potential algorithmic utility, we illustrate the viability of the mean-field Langevin diffusion as a finite-time approximation under various conditions on entropic regularization. Specifically, we show that it closely tracks the Föllmer drift when the regularization is such that the minimizing density is log-concave.

Function approximation by neural nets in the mean-field regime: Entropic regularization and controlled McKean-Vlasov dynamics

TL;DR

The viability of the mean-field Langevin diffusion as a finite-time approximation under various conditions on entropic regularization is illustrated, and it is shown that it closely tracks the F\"ollmer drift when the regularization is such that the minimizing density is log-concave.

Abstract

We consider the problem of function approximation by two-layer neural nets with random weights that are "nearly Gaussian" in the sense of Kullback-Leibler divergence. Our setting is the mean-field limit, where the finite population of neurons in the hidden layer is replaced by a continuous ensemble. We show that the problem can be phrased as global minimization of a free energy functional on the space of (finite-length) paths over probability measures on the weights. This functional trades off the approximation risk of the terminal measure against the KL divergence of the path with respect to an isotropic Brownian motion prior. We characterize the unique global minimizer and examine the dynamics in the space of probability measures over weights that can achieve it. In particular, we show that the optimal path-space measure corresponds to the Föllmer drift, the solution to a McKean-Vlasov optimal control problem closely related to the classic Schrödinger bridge problem. While the Föllmer drift cannot in general be obtained in closed form, thus limiting its potential algorithmic utility, we illustrate the viability of the mean-field Langevin diffusion as a finite-time approximation under various conditions on entropic regularization. Specifically, we show that it closely tracks the Föllmer drift when the regularization is such that the minimizing density is log-concave.

Paper Structure

This paper contains 18 sections, 10 theorems, 140 equations.

Key Result

Proposition 1

The minimum value of the free energy satisfies Moreover, if $\mu^\star \in \mathscr{P}({\mathbb R}^d)$ achieves the infimum in eq:fe_reduction_2, then achieves the infimum in the left-hand side of eq:fe_reduction_1.

Theorems & Definitions (11)

  • Proposition 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Lemma 1
  • Definition A.1
  • Theorem A.1
  • Lemma A.1
  • Lemma A.2: Maurey's empirical method --- high-probability version ji2020transport
  • ...and 1 more