Table of Contents
Fetching ...

Generative Conditional Distributions by Neural (Entropic) Optimal Transport

Bao Nguyen, Binh Nguyen, Hieu Trung Nguyen, Viet Anh Nguyen

TL;DR

The paper tackles learning conditional distributions when data are scarce by proposing GENTLE, a neural transport framework that learns a conditional transport map $T_\theta(x,U)$ and a Kantorovich potential $v_\phi$ via minimax optimization under an entropic OT objective. A KDE-based fitness term aligns generated samples with observed conditionals, while a Lipschitz-style regularizer built on entropic OT between nearby covariates promotes transfer learning across the covariate space. The method integrates a minimum spanning tree-based neighborhood construction and a smoothed gradient-descent-ascent algorithm to ensure stable training and convergence. Empirical results on LDW-CPS and ECM demonstrate superior distributional fidelity (lower WD and KS) and robust qualitative performance compared with state-of-the-art baselines, emphasizing practical potential for decision-making under uncertainty with limited samples.

Abstract

Learning conditional distributions is challenging because the desired outcome is not a single distribution but multiple distributions that correspond to multiple instances of the covariates. We introduce a novel neural entropic optimal transport method designed to effectively learn generative models of conditional distributions, particularly in scenarios characterized by limited sample sizes. Our method relies on the minimax training of two neural networks: a generative network parametrizing the inverse cumulative distribution functions of the conditional distributions and another network parametrizing the conditional Kantorovich potential. To prevent overfitting, we regularize the objective function by penalizing the Lipschitz constant of the network output. Our experiments on real-world datasets show the effectiveness of our algorithm compared to state-of-the-art conditional distribution learning techniques. Our implementation can be found at https://github.com/nguyenngocbaocmt02/GENTLE.

Generative Conditional Distributions by Neural (Entropic) Optimal Transport

TL;DR

The paper tackles learning conditional distributions when data are scarce by proposing GENTLE, a neural transport framework that learns a conditional transport map and a Kantorovich potential via minimax optimization under an entropic OT objective. A KDE-based fitness term aligns generated samples with observed conditionals, while a Lipschitz-style regularizer built on entropic OT between nearby covariates promotes transfer learning across the covariate space. The method integrates a minimum spanning tree-based neighborhood construction and a smoothed gradient-descent-ascent algorithm to ensure stable training and convergence. Empirical results on LDW-CPS and ECM demonstrate superior distributional fidelity (lower WD and KS) and robust qualitative performance compared with state-of-the-art baselines, emphasizing practical potential for decision-making under uncertainty with limited samples.

Abstract

Learning conditional distributions is challenging because the desired outcome is not a single distribution but multiple distributions that correspond to multiple instances of the covariates. We introduce a novel neural entropic optimal transport method designed to effectively learn generative models of conditional distributions, particularly in scenarios characterized by limited sample sizes. Our method relies on the minimax training of two neural networks: a generative network parametrizing the inverse cumulative distribution functions of the conditional distributions and another network parametrizing the conditional Kantorovich potential. To prevent overfitting, we regularize the objective function by penalizing the Lipschitz constant of the network output. Our experiments on real-world datasets show the effectiveness of our algorithm compared to state-of-the-art conditional distribution learning techniques. Our implementation can be found at https://github.com/nguyenngocbaocmt02/GENTLE.
Paper Structure (17 sections, 1 theorem, 18 equations, 8 figures, 4 tables, 2 algorithms)

This paper contains 17 sections, 1 theorem, 18 equations, 8 figures, 4 tables, 2 algorithms.

Key Result

Lemma 4.1

Suppose that $\mathcal{X}$ and $\mathcal{Y}$ are standard Borel spaces and that $(X, Y)$ is an $(\mathcal{X} \times \mathcal{Y})$-valued random variable. Then, there are random variables $U \sim \mathcal{U}(0, 1)$ coupled with $\mathcal{X}$ and $\mathcal{Y}$ and a Borel function $T : \mathcal{X} \ti

Figures (8)

  • Figure 1: Histogram count of Number of observed responses for distinct covariate values in the LDW dataset ref:lalonde1986evaluating. This dataset has over 13,000 covariates with only one response.
  • Figure 2: Positive correlation between the Wasserstein distance between $Y(x_i)$ and $Y(x_j)$ and the covariate distance $\| x_i - x_j \|$. Only covariates $x$ with more than 18 observations are selected.
  • Figure 3: The qualitative results of methods on the LDW-CPS dataset. The density graph 'GT', 'GENTLE' (ours), 'CWGAN', 'WGAN-GP', 'MGAN', 'CDSB' is constructed by applying kernel density estimate (KDE) on $Y_{\text{GT}}(x)$, $Y_{\text{GENTLE}}(x)$, $Y_{\text{CWGAN}}(x)$, $Y_{\text{WGAN-GP}}(x)$, $Y_{\text{MGAN}}(x)$, $Y_{\text{CDSB}}(x)$, respectively. Each subfigure is for a different value $x$ on the test set. We observe that GENTLE can produce the true distribution much more effectively than baselines.
  • Figure 4: The qualitative results of methods on the ECM dataset. The density graph 'GT', 'GENTLE' (ours), 'CWGAN', 'WGAN-GP', 'MGAN', 'CDSB' is constructed by applying kernel density estimate (KDE) on $Y_{\text{GT}}(x)$, $Y_{\text{GENTLE}}(x)$, $Y_{\text{CWGAN}}(x)$, $Y_{\text{WGAN-GP}}(x)$, $Y_{\text{MGAN}}(x)$, $Y_{\text{CDSB}}(x)$, respectively. Each subfigure is for a different value $x$ on the test set. We observe that GENTLE can produce the true distribution much more effectively than baselines.
  • Figure 5: Empirical evidence for the monotonically increasing property of the learned network $T_\theta(x, U)$ in the variable $U$.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Definition 3.1: Optimal Transport Distance, ref:villani2009optimal
  • Lemma 4.1: Noise Outsourcing, ref:austin2015exchangeable
  • Remark 4.2