Learning conditional distributions on continuous spaces

Cyril Bénézet; Ziteng Cheng; Sebastian Jaimungal

Learning conditional distributions on continuous spaces

Cyril Bénézet, Ziteng Cheng, Sebastian Jaimungal

TL;DR

The paper addresses learning conditional distributions $P_x$ in continuous spaces by framing them as measure-valued mappings on unit boxes and proposing two clustering-based estimators, the $\text{r-box}$ and $\text{$k$-NN}$ estimators. It derives convergence and concentration guarantees for these estimators in the Wasserstein metric, with distinct rates depending on the input/output dimensions and the estimator type, and identifies optimal choices for the radius $r$ and neighbor count $k$. To translate these nonparametric estimators into scalable practice, the authors introduce a Lipschitz-adaptive neural network $\tilde P^\Theta$ trained to mimic the empirical estimator, leveraging Approximate NN Search (ANNS-RBSP), the Sinkhorn algorithm for efficient Wasserstein computation, and a convex-potential layer to enable local Lipschitz adaptivity. Empirical results on synthetic 1D and 3D data show LipNet achieving smoother, more accurate conditional distributions and automatic local Lipschitz adaptation, with reproducible code provided. The proposed framework has potential applications in model-based RL and risk-sensitive MDPs where conditional distributions, not just expectations, are essential for decision-making.

Abstract

We investigate sample-based learning of conditional distributions on multi-dimensional unit boxes, allowing for different dimensions of the feature and target spaces. Our approach involves clustering data near varying query points in the feature space to create empirical measures in the target space. We employ two distinct clustering schemes: one based on a fixed-radius ball and the other on nearest neighbors. We establish upper bounds for the convergence rates of both methods and, from these bounds, deduce optimal configurations for the radius and the number of neighbors. We propose to incorporate the nearest neighbors method into neural network training, as our empirical analysis indicates it has better performance in practice. For efficiency, our training process utilizes approximate nearest neighbors search with random binary space partitioning. Additionally, we employ the Sinkhorn algorithm and a sparsity-enforced transport plan. Our empirical findings demonstrate that, with a suitably designed structure, the neural network has the ability to adapt to a suitable level of Lipschitz continuity locally. For reproducibility, our code is available at \url{https://github.com/zcheng-a/LCD_kNN}.

Learning conditional distributions on continuous spaces

TL;DR

The paper addresses learning conditional distributions

in continuous spaces by framing them as measure-valued mappings on unit boxes and proposing two clustering-based estimators, the

and

estimators. It derives convergence and concentration guarantees for these estimators in the Wasserstein metric, with distinct rates depending on the input/output dimensions and the estimator type, and identifies optimal choices for the radius

and neighbor count

. To translate these nonparametric estimators into scalable practice, the authors introduce a Lipschitz-adaptive neural network

trained to mimic the empirical estimator, leveraging Approximate NN Search (ANNS-RBSP), the Sinkhorn algorithm for efficient Wasserstein computation, and a convex-potential layer to enable local Lipschitz adaptivity. Empirical results on synthetic 1D and 3D data show LipNet achieving smoother, more accurate conditional distributions and automatic local Lipschitz adaptation, with reproducible code provided. The proposed framework has potential applications in model-based RL and risk-sensitive MDPs where conditional distributions, not just expectations, are essential for decision-making.

Abstract

Paper Structure (40 sections, 13 theorems, 142 equations, 20 figures, 1 table, 3 algorithms)

This paper contains 40 sections, 13 theorems, 142 equations, 20 figures, 1 table, 3 algorithms.

Introduction
Main contributions
Related works
Estimating conditional distributions via clustering
Lipschitz continuity in neural networks
Organization of the paper
Theoretical results
Setup
Results on $r$-box estimator
Results on $k$-nearest-neighbor estimator
Comments on the convergence rate
On the convergence rate
On the fluctuation
Towards implementation with neural networks
Implementation with neural networks
...and 25 more sections

Key Result

Theorem 7

Under Assumptions hyp: kernel lip and hyp: data, choose $r$ as follows Then, there is a constant $C>0$ (which depends only on $d_{\mathbb{X}},d_{\mathbb{Y}},L,\underline c$), such that, for all probability distribution $\nu \in \mathcal{P}(\mathbb{X})$, we have

Figures (20)

Figure 1: An instance of RBSP in $[0,1]^2$.
Figure 2: Various estimators under Model 1 and 2, joint distributions.
Figure 3: Various estimators under Model 1 and 2, conditional CDFs.
Figure 4: Errors at different $x$'s of various estimators under Model 1 and 2.
Figure 5: Various estimators under Model 3, projections of conditional CDFs.
...and 15 more figures

Theorems & Definitions (33)

Remark 1
Remark 4
Definition 5
Remark 6
Theorem 7
Theorem 8
Definition 9
Theorem 10
Theorem 11
Proposition 13
...and 23 more

Learning conditional distributions on continuous spaces

TL;DR

Abstract

Learning conditional distributions on continuous spaces

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (33)