Optimal transport map estimation in general function spaces

Vincent Divol; Jonathan Niles-Weed; Aram-Alexandre Pooladian

Optimal transport map estimation in general function spaces

Vincent Divol, Jonathan Niles-Weed, Aram-Alexandre Pooladian

TL;DR

A unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces, and provides the first statistical rates of estimation when $P$ is the normal distribution and the transport map is given by an infinite-width shallow neural network.

Abstract

We study the problem of estimating a function $T$ given independent samples from a distribution $P$ and from the pushforward distribution $T_\sharp P$. This setting is motivated by applications in the sciences, where $T$ represents the evolution of a physical system over time, and in machine learning, where, for example, $T$ may represent a transformation learned by a deep neural network trained for a generative modeling task. To ensure identifiability, we assume that $T = \nabla \varphi_0$ is the gradient of a convex function, in which case $T$ is known as an \emph{optimal transport map}. Prior work has studied the estimation of $T$ under the assumption that it lies in a Hölder class, but general theory is lacking. We present a unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces. Our assumptions are significantly weaker than those appearing in the literature: we require only that the source measure $P$ satisfy a Poincaré inequality and that the optimal map be the gradient of a smooth convex function that lies in a space whose metric entropy can be controlled. As a special case, we recover known estimation rates for Hölder transport maps, but also obtain nearly sharp results in many settings not covered by prior work. For example, we provide the first statistical rates of estimation when $P$ is the normal distribution and the transport map is given by an infinite-width shallow neural network.

Optimal transport map estimation in general function spaces

TL;DR

A unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces, and provides the first statistical rates of estimation when

is the normal distribution and the transport map is given by an infinite-width shallow neural network.

Abstract

We study the problem of estimating a function

given independent samples from a distribution

and from the pushforward distribution

. This setting is motivated by applications in the sciences, where

represents the evolution of a physical system over time, and in machine learning, where, for example,

may represent a transformation learned by a deep neural network trained for a generative modeling task. To ensure identifiability, we assume that

is the gradient of a convex function, in which case

is known as an \emph{optimal transport map}. Prior work has studied the estimation of

under the assumption that it lies in a Hölder class, but general theory is lacking. We present a unified methodology for obtaining rates of estimation of optimal transport maps in general function spaces. Our assumptions are significantly weaker than those appearing in the literature: we require only that the source measure

satisfy a Poincaré inequality and that the optimal map be the gradient of a smooth convex function that lies in a space whose metric entropy can be controlled. As a special case, we recover known estimation rates for Hölder transport maps, but also obtain nearly sharp results in many settings not covered by prior work. For example, we provide the first statistical rates of estimation when

is the normal distribution and the transport map is given by an infinite-width shallow neural network.

Paper Structure (36 sections, 46 theorems, 276 equations)

This paper contains 36 sections, 46 theorems, 276 equations.

Introduction
Background on optimal transport under the quadratic cost
Main results
The bounded case
The strongly convex case
Examples
Transport between Location-Scale families
Finite set
Parametric space
Large parametric spaces
Wavelet expansions
Application: Transport between log-concave measures
ReQU neural networks
Reproducing Kernel Hilbert Spaces
"Spiked" potential functions
...and 21 more sections

Key Result

Proposition 1

Let $P$ be a probability distribution with subexponential tails. Consider one of the two following settings: Assume that there exists a constant $K$ such that $\|\nabla\varphi_1(0)-\nabla\varphi_0(0)\|\leq K$ and that $\varphi_0$ is convex. Let $Q \coloneqq (\nabla\varphi_0)_\sharp P$ and $S(\varphi_1) \coloneqq P(\varphi_1) + Q(\varphi_1^*)$. Denoting $\ell \coloneqq S(\varphi_1) - S(\varphi_0)$

Theorems & Definitions (92)

Proposition 1: Map stability
Remark 1
Remark 2
Remark 3
Remark 4
Theorem 1
Lemma 1
Theorem 2
Remark 5
Proposition 2
...and 82 more

Optimal transport map estimation in general function spaces

TL;DR

Abstract

Optimal transport map estimation in general function spaces

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (92)