Sequential transport maps using SoS density estimation and $α$-divergences

Benjamin Zanger; Olivier Zahm; Tiangang Cui; Martin Schreiber

Sequential transport maps using SoS density estimation and $α$-divergences

Benjamin Zanger, Olivier Zahm, Tiangang Cui, Martin Schreiber

TL;DR

This work provides a new convergence analyses of the sequential transport maps based on information geometric properties of $\alpha$-divergences and explores the use of Sum-of-Squares densities and $\alpha$-divergences for approximating the intermediate densities.

Abstract

Transport-based density estimation methods are receiving growing interest because of their ability to efficiently generate samples from the approximated density. We further invertigate the sequential transport maps framework proposed from arXiv:2106.04170 arXiv:2303.02554, which builds on a sequence of composed Knothe-Rosenblatt (KR) maps. Each of those maps are built by first estimating an intermediate density of moderate complexity, and then by computing the exact KR map from a reference density to the precomputed approximate density. In our work, we explore the use of Sum-of-Squares (SoS) densities and $α$-divergences for approximating the intermediate densities. Combining SoS densities with $α$-divergence interestingly yields convex optimization problems which can be efficiently solved using semidefinite programming. The main advantage of $α$-divergences is to enable working with unnormalized densities, which provides benefits both numerically and theoretically. In particular, we provide a new convergence analyses of the sequential transport maps based on information geometric properties of $α$-divergences. The choice of intermediate densities is also crucial for the efficiency of the method. While tempered (or annealed) densities are the state-of-the-art, we introduce diffusion-based intermediate densities which permits to approximate densities known from samples only. Such intermediate densities are well-established in machine learning for generative modeling. Finally we propose low-dimensional maps (or lazy maps) for dealing with high-dimensional problems and numerically demonstrate our methods on Bayesian inference problems and unsupervised learning tasks.

Sequential transport maps using SoS density estimation and $α$-divergences

TL;DR

This work provides a new convergence analyses of the sequential transport maps based on information geometric properties of

-divergences and explores the use of Sum-of-Squares densities and

-divergences for approximating the intermediate densities.

Abstract

-divergences for approximating the intermediate densities. Combining SoS densities with

-divergence interestingly yields convex optimization problems which can be efficiently solved using semidefinite programming. The main advantage of

-divergences is to enable working with unnormalized densities, which provides benefits both numerically and theoretically. In particular, we provide a new convergence analyses of the sequential transport maps based on information geometric properties of

-divergences. The choice of intermediate densities is also crucial for the efficiency of the method. While tempered (or annealed) densities are the state-of-the-art, we introduce diffusion-based intermediate densities which permits to approximate densities known from samples only. Such intermediate densities are well-established in machine learning for generative modeling. Finally we propose low-dimensional maps (or lazy maps) for dealing with high-dimensional problems and numerically demonstrate our methods on Bayesian inference problems and unsupervised learning tasks.

Paper Structure (25 sections, 8 theorems, 103 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 8 theorems, 103 equations, 10 figures, 3 tables, 1 algorithm.

Introduction
Variational density estimation using $\alpha$-divergence
Sum-of-Squares densities
Orthonormal tensorized basis for Sum-of-Squares
Integration of SoS functions
SoS densities and Knothe--Rosenblatt map
Conditional SoS maps
Sequential transport maps using $\alpha$-divergence
Validation of Assumption \ref{['assu:bound_divergence_of_bridging_densities']}
Geometric interpretation of Assumption \ref{['assump:symetric_proj_diff']}
Implementing sequential maps for high-dimensional and data-driven problems
Towards high dimensions using subspace projections: the lazy maps
Sequential transport maps from data using diffusion
Numerical Examples
Multimodal density from data by diffusion process
...and 10 more sections

Key Result

Proposition 1

Let $\mathcal{L}$ be the integration operator over the variable $x_\ell$ with $\ell\in\{1,\hdots,d\}$ defined by Let $g_A$ be a SoS function as in eq:SoS_function with $\mathop{\mathrm{\rho_{\mathrm{ref}}}}\nolimits(\bm x)=\prod_{i=1}^d \mathop{\mathrm{\rho_{\mathrm{ref}}}}\nolimits_i(x_i)$ and $(\Phi(\bm x))_{\sigma(\alpha)}=\prod_{i=1}^d \phi_{\alpha_{i}}^i(x_i)$, where $\{\phi_{1}^\ell,\phi_{2

Figures (10)

Figure 1: Visualization of the approximation of a bimodal density $\pi$ (right) using $L=3$ intermediate tempered densities estimated using SoS \ref{['eq:approx_class_SoS']} and a Gaussian reference density $\mathop{\mathrm{\rho_{\mathrm{ref}}}}\nolimits$.
Figure 2: Left: functions $\phi_{\alpha}^{n}(t)$ as in Eq. \ref{['eq:def_alpha_divergences_normalized']} associated with $\alpha$-divergences for normalized densities. Right: functions $\phi_{\alpha}(t)=\phi_{\alpha}^{n}(t)-\frac{t-1}{\alpha-1}$ as in \ref{['eq:def_alpha_divergences']} associated with $\alpha$-divergences for unnormalized densities. Contrarily to $\phi_{\alpha}^{n}$, the affine term $-\frac{t-1}{\alpha-1}$ preserves the convexity of $\phi_{\alpha}$ while ensuring $\phi_{\alpha}$ to admit a minimum at $t=1$.
Figure 3: Visualization of the construction of $\overline{\mathcal{Q}}_A$ from a given density $\pi_A$. First, one creates the needed marginalization of $\pi_A$ and the conditionals $\pi_A(x_i|x_{1}, \dots, x_{n-1}) = \pi(x_i | x_{\leq i-1})$. Then, the CDFs are computed by calculating the antiderivatives, which gives access to the RT. The inverse RT is constructed by inversion of the CDFs according to formula \ref{['eq:IRT']}.
Figure 4: Visual comparison of tempered (top) and diffusion based (bottom) bridging densities for a banana distribution (right) with a Gaussian reference distribution (left).
Figure 5: Visualization of the $\alpha$-geodesic going through $\pi^{(\ell)}$ and $\pi^{(\ell+1)}$ as well as the approximation submanifold $\mathcal{M}$ with the approximation $\widetilde{\pi}^{(\ell)}$ being the $\alpha$-projection of $\pi^{(\ell)}$ onto $\mathcal{M}$. The $\alpha^*$-projection back on the $\alpha$-geodesic is used in order to use the generalized Pythagorean theorem between $\widetilde{\pi}^{(\ell)}$, $f_{\text{proj}}^{(\ell)}$, and $\pi^{(\ell+1)}$.
...and 5 more figures

Theorems & Definitions (22)

Remark 1
Example 1: Orthogonal polynomials
Example 2: Transformed Legendre polynomials shen_approximations_2014
Proposition 1: Integration over one variable
proof
Corollary 1: Integration over severall variables
Remark 2
Remark 3: Non orthonormal basis
Remark 4: $\mathcal{M}_{\text{SoS}}$ contains the identity map
Example 3: Defining $\mathop{\mathrm{\rho_{\mathrm{ref}}}}\nolimits$ on indefinite domains
...and 12 more

Sequential transport maps using SoS density estimation and $α$-divergences

TL;DR

Abstract

Sequential transport maps using SoS density estimation and $α$-divergences

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (22)