Table of Contents
Fetching ...

Transportation of Measure Regression in Higher Dimensions

Laya Ghodrati, Victor M. Panaretos

Abstract

We present an optimal transport framework for performing regression when both the covariate and the response are probability distributions on a compact Euclidean subset $Ω\subset\mathbb{R}^d$, where $d>1$. Extending beyond compactly supported distributions, this method also applies when both the predictor and responses are Gaussian distributions on $\mathbb{R}^d$. Our approach generalizes an existing transportation-based regression model to higher dimensions. This model postulates that the conditional Fréchet mean of the response distribution is linked to the covariate distribution via an optimal transport map. We establish an upper bound for the rate of convergence of a plug-in estimator. We propose an iterative algorithm for computing the estimator, which is based on DC (Difference of Convex Functions) Programming. In the Gaussian case, the estimator achieves a parametric rate of convergence, and the computation of the estimator simplifies to a finite-dimensional optimization over positive definite matrices, allowing for an efficient solution. The performance of the estimator is demonstrated in a simulation study.

Transportation of Measure Regression in Higher Dimensions

Abstract

We present an optimal transport framework for performing regression when both the covariate and the response are probability distributions on a compact Euclidean subset , where . Extending beyond compactly supported distributions, this method also applies when both the predictor and responses are Gaussian distributions on . Our approach generalizes an existing transportation-based regression model to higher dimensions. This model postulates that the conditional Fréchet mean of the response distribution is linked to the covariate distribution via an optimal transport map. We establish an upper bound for the rate of convergence of a plug-in estimator. We propose an iterative algorithm for computing the estimator, which is based on DC (Difference of Convex Functions) Programming. In the Gaussian case, the estimator achieves a parametric rate of convergence, and the computation of the estimator simplifies to a finite-dimensional optimization over positive definite matrices, allowing for an efficient solution. The performance of the estimator is demonstrated in a simulation study.
Paper Structure (10 sections, 17 theorems, 105 equations, 5 figures, 2 algorithms)

This paper contains 10 sections, 17 theorems, 105 equations, 5 figures, 2 algorithms.

Key Result

Theorem 3.4

(Identifiability) Assume that the law $P$ induced by model model-d satisfies Assumptions assumptionMu, assumptionT, and noise_map. Then, the regressor operator $\Gamma(\mu)=T_0\#\mu$ in the model model-d is identifiable over the class of maps $T \in \mathcal{T}$, up to $Q$-null sets. Specifically, f where for any $T\in \mathcal{T}$,

Figures (5)

  • Figure 1: Illustration of Model \ref{['model-d']} for $d=2$, showing samples from $\mu$ (blue), $T_0\#\mu$ (black), and $\nu=T_\epsilon\#T_0\#\mu$ (red) for four different realisations of the error map, along with corresponding displacement vector field (flow curves).
  • Figure 2: Illustration of Model \ref{['model-gaussian']} for $d=2$, showing
  • Figure 3: True vector field
  • Figure 4: Estimated vector field
  • Figure 5: Box plots showing the logarithmic squared error between the estimated matrix $\hat{T}_N$ and the true matrix $T_0$ based on 50 replication for various combination of $d$ and $N$. Given that the number of pairs $N$ are exponents of $2$ for each $d$, the median alignment along a -1/2 slope illustrates the $N^{-1/2}$ convergence rate, in accordance with Theorem \ref{['rate_of_convergence-gaussian']}.

Theorems & Definitions (33)

  • Theorem 3.4
  • Remark 3.5: Identifiability $Q$-almost everywhere
  • Definition 4.2
  • Remark 4.4
  • Theorem 4.5
  • Remark 4.6
  • Lemma 4.7: Semi-metric property of $\rho$
  • Theorem 4.8
  • Remark 4.9: The case $d=1$
  • Lemma 5.1
  • ...and 23 more