Table of Contents
Fetching ...

On the representation and learning of monotone triangular transport maps

Ricardo Baptista, Youssef Marzouk, Olivier Zahm

TL;DR

This work develops a rectification-based, semi-parametric framework for representing and learning monotone triangular transport maps, rooted in Knothe–Rosenblatt rearrangements. By transforming non-monotone function components via a bijective rectification operator, the authors convert the original constrained learning problem into an unconstrained one with a differentiable objective and, under suitable tail conditions, establish that local minima are global and the KR map is the unique global minimizer. They propose an adaptive algorithm (ATM) that builds sparse, interpretable map representations using polynomial or wavelet bases and cross-validation to adapt model complexity to data size, enabling effective density estimation, conditional density estimation, and structure learning of DAGs. The method demonstrates strong empirical performance across one- and two-dimensional targets, stochastic volatility, and tabular datasets, while uncovering conditional independence and sparsity patterns in learned maps. Overall, the rectification framework provides a principled, tractable path to learning informative, sparse transport maps with theoretical guarantees and practical scalability for complex distributions.

Abstract

Transportation of measure provides a versatile approach for modeling complex probability distributions, with applications in density estimation, Bayesian inference, generative modeling, and beyond. Monotone triangular transport maps$\unicode{x2014}$approximations of the Knothe$\unicode{x2013}$Rosenblatt (KR) rearrangement$\unicode{x2014}$are a canonical choice for these tasks. Yet the representation and parameterization of such maps have a significant impact on their generality and expressiveness, and on properties of the optimization problem that arises in learning a map from data (e.g., via maximum likelihood estimation). We present a general framework for representing monotone triangular maps via invertible transformations of smooth functions. We establish conditions on the transformation such that the associated infinite-dimensional minimization problem has no spurious local minima, i.e., all local minima are global minima; and we show for target distributions satisfying certain tail conditions that the unique global minimizer corresponds to the KR map. Given a sample from the target, we then propose an adaptive algorithm that estimates a sparse semi-parametric approximation of the underlying KR map. We demonstrate how this framework can be applied to joint and conditional density estimation, likelihood-free inference, and structure learning of directed graphical models, with stable generalization performance across a range of sample sizes.

On the representation and learning of monotone triangular transport maps

TL;DR

This work develops a rectification-based, semi-parametric framework for representing and learning monotone triangular transport maps, rooted in Knothe–Rosenblatt rearrangements. By transforming non-monotone function components via a bijective rectification operator, the authors convert the original constrained learning problem into an unconstrained one with a differentiable objective and, under suitable tail conditions, establish that local minima are global and the KR map is the unique global minimizer. They propose an adaptive algorithm (ATM) that builds sparse, interpretable map representations using polynomial or wavelet bases and cross-validation to adapt model complexity to data size, enabling effective density estimation, conditional density estimation, and structure learning of DAGs. The method demonstrates strong empirical performance across one- and two-dimensional targets, stochastic volatility, and tabular datasets, while uncovering conditional independence and sparsity patterns in learned maps. Overall, the rectification framework provides a principled, tractable path to learning informative, sparse transport maps with theoretical guarantees and practical scalability for complex distributions.

Abstract

Transportation of measure provides a versatile approach for modeling complex probability distributions, with applications in density estimation, Bayesian inference, generative modeling, and beyond. Monotone triangular transport mapsapproximations of the KnotheRosenblatt (KR) rearrangementare a canonical choice for these tasks. Yet the representation and parameterization of such maps have a significant impact on their generality and expressiveness, and on properties of the optimization problem that arises in learning a map from data (e.g., via maximum likelihood estimation). We present a general framework for representing monotone triangular maps via invertible transformations of smooth functions. We establish conditions on the transformation such that the associated infinite-dimensional minimization problem has no spurious local minima, i.e., all local minima are global minima; and we show for target distributions satisfying certain tail conditions that the unique global minimizer corresponds to the KR map. Given a sample from the target, we then propose an adaptive algorithm that estimates a sparse semi-parametric approximation of the underlying KR map. We demonstrate how this framework can be applied to joint and conditional density estimation, likelihood-free inference, and structure learning of directed graphical models, with stable generalization performance across a range of sample sizes.

Paper Structure

This paper contains 37 sections, 14 theorems, 102 equations, 9 figures, 3 tables, 2 algorithms.

Key Result

Proposition 1

Let $S_{\textrm{KR}}$ be the KR rearrangement pushing forward a distribution with density $\pi$ on $\mathbb{R}^d$ to the standard normal distribution on $\mathbb{R}^d$, with density $\eta$. For any map $S:\mathbb{R}^d\rightarrow\mathbb{R}^d$ as in eq:increasingMaps, we have

Figures (9)

  • Figure 1: The rectifier \ref{['eq:rectifierintro']} transforms the non-monotone function $f$ into the monotone function $S = \mathcal{R}(f)$. Here we choose $g(\cdot) = \log (1 + \exp(\cdot) )$. $S$ is an increasing transport that pushes forward a one-dimensional mixture of Gaussians $\pi(x) = 0.5\mathcal{N}(x;-1,1) + 0.5\mathcal{N}(x; 1,1)$ to the standard Gaussian reference density $\eta$.
  • Figure 2: Objective function $\widehat{\mathcal{L}}_{1} = \widehat{\mathcal{J}}_{1} \circ \mathcal{R}_{1}$ where the rectifier $\mathcal{R}_1$ is defined using the soft-plus $g$\ref{['eq:def_g']} (left), the shifted ELU $g$\ref{['eq:elu']} (middle), or the square function $g(\xi)=\xi^2$ (right). Here, $\pi(x) = 1/2\mathcal{N}(x; -2,0.5) +1/2 \mathcal{N}(x; 2,2)$ is a univariate Gaussian mixture, and we use $n = 50$ to define $\mathcal{J}_1$ with $f_1$ represented using a linear combination of Hermite functions up to degree $10$. The objective is evaluated along line segments that interpolate between random initial maps ($t=0$) and critical points resulting from a gradient-based optimization method ($t=1$). Observe that with bijective $g$ (left and middle) the algorithm always arrives at the same optimal value, whereas with the square function $g$ (right) the algorithm gets stuck in local minima and rarely attains the optimal value.
  • Figure 3: A $k=2$ dimensional downward-closed active set of multi-indices $\Lambda_{t}$ with its margin $\Lambda_{t}^{\text{M}}$ and reduced margin $\Lambda_{t}^{\text{RM}}$. The margin and reduced margins are plotted before (left) and after (right) adding to $\boldsymbol{\alpha}_t^\ast = (2,1)$, that is denoted with a cross, to $\Lambda_{t}$.
  • Figure 4: (a) The pullback density $\widehat{S}^\sharp\eta$ approaches the target Gaussian mixture density $\pi$ when increasing the maximum polynomial degree $p$ of the space $V_1^p \subset V_1$. (b) With increasing $p$, the pullback density converges to $\pi$ in KL divergence, and the estimated map converges to $S_{\textrm{KR}}$ in $L^2_\pi$.
  • Figure 5: (a) The approximate transport maps $\widehat{S}$ compared to $S_{\textrm{KR}}$ (black), and (b) the corresponding non-monotone functions $f$ compared to $f_{\textrm{KR}} \coloneqq \mathcal{R}^{-1}(S_{\textrm{KR}})$. Both subfigures illustrate different choices of basis $\{\psi_\alpha\}_\alpha$, for the Gaussian mixture target of Figure \ref{['fig:mog_bimodal']}. The modified Hermite polynomial basis provides the closest approximation to $S_{\textrm{KR}}$ and $f_{\textrm{KR}}$.
  • ...and 4 more figures

Theorems & Definitions (41)

  • Proposition 1
  • Remark 1
  • Proposition 2
  • proof
  • Remark 2
  • Proposition 3
  • proof
  • Theorem 4
  • proof
  • Remark 3: Assumption \ref{['eq:AssumptionPiBounded']} implies Gaussian tails
  • ...and 31 more