Table of Contents
Fetching ...

Distribution learning via neural differential equations: minimal energy regularization and approximation theory

Youssef Marzouk, Zhi Ren, Jakob Zech

TL;DR

This paper develops a rigorous framework for distribution learning via neural differential equations, showing that a large class of transport maps $T$ admits a time-dependent velocity field $f$ whose ODE flow yields the straight-line interpolation $T_t(x)=(1-t)x+tT(x)$. It establishes that such velocity fields minimize a minimum-energy regularization and derives explicit $C^k$ bounds for $f$ in terms of $T$, with sharper, density-dependent bounds for Knothe–Rosenblatt maps. The authors prove distributional stability in Wasserstein and KL divergences under velocity-field approximation errors and provide explicit neural-network (NN) approximation rates: ReLU$^2$ nets can approximate $f$ with network size that scales polynomially in the target accuracy and problem dimensions, yielding provable $(W_p, ext{KL})$-type convergence guarantees. This work thus connects regularity, optimal-transport-inspired structure, and NN approximation to produce explicit, quantitative guarantees for distribution learning via neural ODEs. The results pave the way for principled, scalable sampling and density estimation with ODE-based transport, and offer precise guidance on network architectures needed to achieve a desired accuracy.

Abstract

Neural ordinary differential equations (ODEs) provide expressive representations of invertible transport maps that can be used to approximate complex probability distributions, e.g., for generative modeling, density estimation, and Bayesian inference. We show that for a large class of transport maps $T$, there exists a time-dependent ODE velocity field realizing a straight-line interpolation $(1-t)x + tT(x)$, $t \in [0,1]$, of the displacement induced by the map. Moreover, we show that such velocity fields are minimizers of a training objective containing a specific minimum-energy regularization. We then derive explicit upper bounds for the $C^k$ norm of the velocity field that are polynomial in the $C^k$ norm of the corresponding transport map $T$; in the case of triangular (Knothe--Rosenblatt) maps, we also show that these bounds are polynomial in the $C^k$ norms of the associated source and target densities. Combining these results with stability arguments for distribution approximation via ODEs, we show that Wasserstein or Kullback--Leibler approximation of the target distribution to any desired accuracy $ε> 0$ can be achieved by a deep neural network representation of the velocity field whose size is bounded explicitly in terms of $ε$, the dimension, and the smoothness of the source and target densities. The same neural network ansatz yields guarantees on the value of the regularized training objective.

Distribution learning via neural differential equations: minimal energy regularization and approximation theory

TL;DR

This paper develops a rigorous framework for distribution learning via neural differential equations, showing that a large class of transport maps admits a time-dependent velocity field whose ODE flow yields the straight-line interpolation . It establishes that such velocity fields minimize a minimum-energy regularization and derives explicit bounds for in terms of , with sharper, density-dependent bounds for Knothe–Rosenblatt maps. The authors prove distributional stability in Wasserstein and KL divergences under velocity-field approximation errors and provide explicit neural-network (NN) approximation rates: ReLU nets can approximate with network size that scales polynomially in the target accuracy and problem dimensions, yielding provable -type convergence guarantees. This work thus connects regularity, optimal-transport-inspired structure, and NN approximation to produce explicit, quantitative guarantees for distribution learning via neural ODEs. The results pave the way for principled, scalable sampling and density estimation with ODE-based transport, and offer precise guidance on network architectures needed to achieve a desired accuracy.

Abstract

Neural ordinary differential equations (ODEs) provide expressive representations of invertible transport maps that can be used to approximate complex probability distributions, e.g., for generative modeling, density estimation, and Bayesian inference. We show that for a large class of transport maps , there exists a time-dependent ODE velocity field realizing a straight-line interpolation , , of the displacement induced by the map. Moreover, we show that such velocity fields are minimizers of a training objective containing a specific minimum-energy regularization. We then derive explicit upper bounds for the norm of the velocity field that are polynomial in the norm of the corresponding transport map ; in the case of triangular (Knothe--Rosenblatt) maps, we also show that these bounds are polynomial in the norms of the associated source and target densities. Combining these results with stability arguments for distribution approximation via ODEs, we show that Wasserstein or Kullback--Leibler approximation of the target distribution to any desired accuracy can be achieved by a deep neural network representation of the velocity field whose size is bounded explicitly in terms of , the dimension, and the smoothness of the source and target densities. The same neural network ansatz yields guarantees on the value of the regularized training objective.

Paper Structure

This paper contains 25 sections, 36 theorems, 170 equations, 1 algorithm.

Key Result

Lemma 3.1

Let $\Omega_0\subseteq\mathbb{R}^d$ be convex and $T\in C^1(\Omega_0,\mathbb{R}^d)$ such that $\det \nabla_x T(x)\neq 0$ for all $x\in\Omega_0$. Then $T$ is injective.

Theorems & Definitions (75)

  • Remark 2.3
  • Lemma 3.1
  • proof : Proof of Lemma \ref{['lemma:Tinjective']}
  • Lemma 3.3
  • proof : Proof of Lemma \ref{['lemma:spectrum']}
  • Theorem 3.4
  • proof : Proof of Theorem \ref{['thm:f']}
  • Remark 3.5
  • Example 3.6: Rotation
  • Definition 3.7
  • ...and 65 more