Table of Contents
Fetching ...

Neural Local Wasserstein Regression

Inga Girshfeld, Xiaohui Chen

TL;DR

The paper tackles distribution-on-distribution regression where both predictors and responses are probability measures, a setting where global optimal-transport maps and tangent-space linearizations can fail in high dimensions. It introduces Neural Local Wasserstein Regression, a nonparametric framework that learns covariate-dependent, locally defined transport maps in the $2$-Wasserstein space by combining kernel weights with neural parameterizations of transport operators. The approach employs DeepSets for distributional inputs and a U‑Net for image-like data, optimized via Sinkhorn-approximated $W_2$ losses and a data-driven bandwidth rule, enabling scalable local models around reference measures. Empirical results on Gaussian, Gaussian mixtures, and MNIST demonstrate that local transport captures nonlinear distributional relationships that global methods miss, with practical implications for high-dimensional distributional regression and robust geometry-aware learning.

Abstract

We study the estimation problem of distribution-on-distribution regression, where both predictors and responses are probability measures. Existing approaches typically rely on a global optimal transport map or tangent-space linearization, which can be restrictive in approximation capacity and distort geometry in multivariate underlying domains. In this paper, we propose the \emph{Neural Local Wasserstein Regression}, a flexible nonparametric framework that models regression through locally defined transport maps in Wasserstein space. Our method builds on the analogy with classical kernel regression: kernel weights based on the 2-Wasserstein distance localize estimators around reference measures, while neural networks parameterize transport operators that adapt flexibly to complex data geometries. This localized perspective broadens the class of admissible transformations and avoids the limitations of global map assumptions and linearization structures. We develop a practical training procedure using DeepSets-style architectures and Sinkhorn-approximated losses, combined with a greedy reference selection strategy for scalability. Through synthetic experiments on Gaussian and mixture models, as well as distributional prediction tasks on MNIST, we demonstrate that our approach effectively captures nonlinear and high-dimensional distributional relationships that elude existing methods.

Neural Local Wasserstein Regression

TL;DR

The paper tackles distribution-on-distribution regression where both predictors and responses are probability measures, a setting where global optimal-transport maps and tangent-space linearizations can fail in high dimensions. It introduces Neural Local Wasserstein Regression, a nonparametric framework that learns covariate-dependent, locally defined transport maps in the -Wasserstein space by combining kernel weights with neural parameterizations of transport operators. The approach employs DeepSets for distributional inputs and a U‑Net for image-like data, optimized via Sinkhorn-approximated losses and a data-driven bandwidth rule, enabling scalable local models around reference measures. Empirical results on Gaussian, Gaussian mixtures, and MNIST demonstrate that local transport captures nonlinear distributional relationships that global methods miss, with practical implications for high-dimensional distributional regression and robust geometry-aware learning.

Abstract

We study the estimation problem of distribution-on-distribution regression, where both predictors and responses are probability measures. Existing approaches typically rely on a global optimal transport map or tangent-space linearization, which can be restrictive in approximation capacity and distort geometry in multivariate underlying domains. In this paper, we propose the \emph{Neural Local Wasserstein Regression}, a flexible nonparametric framework that models regression through locally defined transport maps in Wasserstein space. Our method builds on the analogy with classical kernel regression: kernel weights based on the 2-Wasserstein distance localize estimators around reference measures, while neural networks parameterize transport operators that adapt flexibly to complex data geometries. This localized perspective broadens the class of admissible transformations and avoids the limitations of global map assumptions and linearization structures. We develop a practical training procedure using DeepSets-style architectures and Sinkhorn-approximated losses, combined with a greedy reference selection strategy for scalability. Through synthetic experiments on Gaussian and mixture models, as well as distributional prediction tasks on MNIST, we demonstrate that our approach effectively captures nonlinear and high-dimensional distributional relationships that elude existing methods.

Paper Structure

This paper contains 21 sections, 29 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Progression of pushforward distributions for a fixed reference measure $\mu_0^{(0)}$ over training epochs.
  • Figure 2: MNIST transformation results across multiple digit pairs. Each row shows a source digit (left), predicted output (middle), and ground-truth target (right). Images are treated as normalized discrete probability distributions while evaluating loss.