Table of Contents
Fetching ...

Sequential transport maps using SoS density estimation and $α$-divergences

Benjamin Zanger, Olivier Zahm, Tiangang Cui, Martin Schreiber

TL;DR

This work provides a new convergence analyses of the sequential transport maps based on information geometric properties of $\alpha$-divergences and explores the use of Sum-of-Squares densities and $\alpha$-divergences for approximating the intermediate densities.

Abstract

Transport-based density estimation methods are receiving growing interest because of their ability to efficiently generate samples from the approximated density. We further invertigate the sequential transport maps framework proposed from arXiv:2106.04170 arXiv:2303.02554, which builds on a sequence of composed Knothe-Rosenblatt (KR) maps. Each of those maps are built by first estimating an intermediate density of moderate complexity, and then by computing the exact KR map from a reference density to the precomputed approximate density. In our work, we explore the use of Sum-of-Squares (SoS) densities and $α$-divergences for approximating the intermediate densities. Combining SoS densities with $α$-divergence interestingly yields convex optimization problems which can be efficiently solved using semidefinite programming. The main advantage of $α$-divergences is to enable working with unnormalized densities, which provides benefits both numerically and theoretically. In particular, we provide a new convergence analyses of the sequential transport maps based on information geometric properties of $α$-divergences. The choice of intermediate densities is also crucial for the efficiency of the method. While tempered (or annealed) densities are the state-of-the-art, we introduce diffusion-based intermediate densities which permits to approximate densities known from samples only. Such intermediate densities are well-established in machine learning for generative modeling. Finally we propose low-dimensional maps (or lazy maps) for dealing with high-dimensional problems and numerically demonstrate our methods on Bayesian inference problems and unsupervised learning tasks.

Sequential transport maps using SoS density estimation and $α$-divergences

TL;DR

This work provides a new convergence analyses of the sequential transport maps based on information geometric properties of -divergences and explores the use of Sum-of-Squares densities and -divergences for approximating the intermediate densities.

Abstract

Transport-based density estimation methods are receiving growing interest because of their ability to efficiently generate samples from the approximated density. We further invertigate the sequential transport maps framework proposed from arXiv:2106.04170 arXiv:2303.02554, which builds on a sequence of composed Knothe-Rosenblatt (KR) maps. Each of those maps are built by first estimating an intermediate density of moderate complexity, and then by computing the exact KR map from a reference density to the precomputed approximate density. In our work, we explore the use of Sum-of-Squares (SoS) densities and -divergences for approximating the intermediate densities. Combining SoS densities with -divergence interestingly yields convex optimization problems which can be efficiently solved using semidefinite programming. The main advantage of -divergences is to enable working with unnormalized densities, which provides benefits both numerically and theoretically. In particular, we provide a new convergence analyses of the sequential transport maps based on information geometric properties of -divergences. The choice of intermediate densities is also crucial for the efficiency of the method. While tempered (or annealed) densities are the state-of-the-art, we introduce diffusion-based intermediate densities which permits to approximate densities known from samples only. Such intermediate densities are well-established in machine learning for generative modeling. Finally we propose low-dimensional maps (or lazy maps) for dealing with high-dimensional problems and numerically demonstrate our methods on Bayesian inference problems and unsupervised learning tasks.
Paper Structure (25 sections, 8 theorems, 103 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 8 theorems, 103 equations, 10 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

Let $\mathcal{L}$ be the integration operator over the variable $x_\ell$ with $\ell\in\{1,\hdots,d\}$ defined by Let $g_A$ be a SoS function as in eq:SoS_function with $\mathop{\mathrm{\rho_{\mathrm{ref}}}}\nolimits(\bm x)=\prod_{i=1}^d \mathop{\mathrm{\rho_{\mathrm{ref}}}}\nolimits_i(x_i)$ and $(\Phi(\bm x))_{\sigma(\alpha)}=\prod_{i=1}^d \phi_{\alpha_{i}}^i(x_i)$, where $\{\phi_{1}^\ell,\phi_{2

Figures (10)

  • Figure 1: Visualization of the approximation of a bimodal density $\pi$ (right) using $L=3$ intermediate tempered densities estimated using SoS \ref{['eq:approx_class_SoS']} and a Gaussian reference density $\mathop{\mathrm{\rho_{\mathrm{ref}}}}\nolimits$.
  • Figure 2: Left: functions $\phi_{\alpha}^{n}(t)$ as in Eq. \ref{['eq:def_alpha_divergences_normalized']} associated with $\alpha$-divergences for normalized densities. Right: functions $\phi_{\alpha}(t)=\phi_{\alpha}^{n}(t)-\frac{t-1}{\alpha-1}$ as in \ref{['eq:def_alpha_divergences']} associated with $\alpha$-divergences for unnormalized densities. Contrarily to $\phi_{\alpha}^{n}$, the affine term $-\frac{t-1}{\alpha-1}$ preserves the convexity of $\phi_{\alpha}$ while ensuring $\phi_{\alpha}$ to admit a minimum at $t=1$.
  • Figure 3: Visualization of the construction of $\overline{\mathcal{Q}}_A$ from a given density $\pi_A$. First, one creates the needed marginalization of $\pi_A$ and the conditionals $\pi_A(x_i|x_{1}, \dots, x_{n-1}) = \pi(x_i | x_{\leq i-1})$. Then, the CDFs are computed by calculating the antiderivatives, which gives access to the RT. The inverse RT is constructed by inversion of the CDFs according to formula \ref{['eq:IRT']}.
  • Figure 4: Visual comparison of tempered (top) and diffusion based (bottom) bridging densities for a banana distribution (right) with a Gaussian reference distribution (left).
  • Figure 5: Visualization of the $\alpha$-geodesic going through $\pi^{(\ell)}$ and $\pi^{(\ell+1)}$ as well as the approximation submanifold $\mathcal{M}$ with the approximation $\widetilde{\pi}^{(\ell)}$ being the $\alpha$-projection of $\pi^{(\ell)}$ onto $\mathcal{M}$. The $\alpha^*$-projection back on the $\alpha$-geodesic is used in order to use the generalized Pythagorean theorem between $\widetilde{\pi}^{(\ell)}$, $f_{\text{proj}}^{(\ell)}$, and $\pi^{(\ell+1)}$.
  • ...and 5 more figures

Theorems & Definitions (22)

  • Remark 1
  • Example 1: Orthogonal polynomials
  • Example 2: Transformed Legendre polynomials shen_approximations_2014
  • Proposition 1: Integration over one variable
  • proof
  • Corollary 1: Integration over severall variables
  • Remark 2
  • Remark 3: Non orthonormal basis
  • Remark 4: $\mathcal{M}_{\text{SoS}}$ contains the identity map
  • Example 3: Defining $\mathop{\mathrm{\rho_{\mathrm{ref}}}}\nolimits$ on indefinite domains
  • ...and 12 more