Table of Contents
Fetching ...

Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence

José Manuel de Frutos, Manuel A. Vázquez, Pablo M. Olmos, Joaquín Míguez

TL;DR

This work introduces dual-ISL, a convex, likelihood-free objective for training implicit generative processes by swapping the data and model roles in ISL. It establishes that the resulting discrepancy $d_K(p,\tilde p)$ is continuous under weak convergence and convex in the model density, enabling stable optimization in density space. A central contribution is interpreting $d_K$ as the $L^2$ projection of the density ratio $q= p/\tilde p$ onto Bernstein polynomials, yielding an explicit, tractable closed-form density approximation $p_K$ with sharp error bounds and convergence rates. The framework extends naturally to sliced multivariate settings, preserving convexity, and empirical results show faster convergence, smoother training, and better mode coverage than standard ISL and competing generative methods, while providing explicit density estimates for downstream tasks.

Abstract

Rank-based statistical metrics, such as the invariant statistical loss (ISL), have recently emerged as robust and practically effective tools for training implicit generative models. In this work, we introduce dual-ISL, a novel likelihood-free objective for training implicit generative models that interchanges the roles of the target and model distributions in the ISL framework, yielding a convex optimization problem in the space of model densities. We prove that the resulting rank-based discrepancy $d_K$ is i) continuous under weak convergence and with respect to the $L^1$ norm, and ii) convex in its first argument-properties not shared by classical divergences such as KL or Wasserstein distances. Building on this, we develop a theoretical framework that interprets $d_K$ as an $L^2$-projection of the density ratio $q = p/\tilde p$ onto a Bernstein polynomial basis, from which we derive exact bounds on the truncation error, precise convergence rates, and a closed-form expression for the truncated density approximation. We further extend our analysis to the multivariate setting via random one-dimensional projections, defining a sliced dual-ISL divergence that retains both convexity and continuity. We empirically show that these theoretical advantages translate into practical ones. Specifically, across several benchmarks dual-ISL converges more rapidly, delivers markedly smoother and more stable training, and more effectively prevents mode collapse than classical ISL and other leading implicit generative methods-while also providing an explicit density approximation.

Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence

TL;DR

This work introduces dual-ISL, a convex, likelihood-free objective for training implicit generative processes by swapping the data and model roles in ISL. It establishes that the resulting discrepancy is continuous under weak convergence and convex in the model density, enabling stable optimization in density space. A central contribution is interpreting as the projection of the density ratio onto Bernstein polynomials, yielding an explicit, tractable closed-form density approximation with sharp error bounds and convergence rates. The framework extends naturally to sliced multivariate settings, preserving convexity, and empirical results show faster convergence, smoother training, and better mode coverage than standard ISL and competing generative methods, while providing explicit density estimates for downstream tasks.

Abstract

Rank-based statistical metrics, such as the invariant statistical loss (ISL), have recently emerged as robust and practically effective tools for training implicit generative models. In this work, we introduce dual-ISL, a novel likelihood-free objective for training implicit generative models that interchanges the roles of the target and model distributions in the ISL framework, yielding a convex optimization problem in the space of model densities. We prove that the resulting rank-based discrepancy is i) continuous under weak convergence and with respect to the norm, and ii) convex in its first argument-properties not shared by classical divergences such as KL or Wasserstein distances. Building on this, we develop a theoretical framework that interprets as an -projection of the density ratio onto a Bernstein polynomial basis, from which we derive exact bounds on the truncation error, precise convergence rates, and a closed-form expression for the truncated density approximation. We further extend our analysis to the multivariate setting via random one-dimensional projections, defining a sliced dual-ISL divergence that retains both convexity and continuity. We empirically show that these theoretical advantages translate into practical ones. Specifically, across several benchmarks dual-ISL converges more rapidly, delivers markedly smoother and more stable training, and more effectively prevents mode collapse than classical ISL and other leading implicit generative methods-while also providing an explicit density approximation.

Paper Structure

This paper contains 41 sections, 13 theorems, 102 equations, 17 figures, 6 tables, 2 algorithms.

Key Result

Theorem 2.1

If $p = \tilde{p}$, then $A_{K}$ is uniformly distributed on $\{0,\ldots,K\}$, i.e. $\mathbb{Q}_{K}(n)=\tfrac{1}{K+1}$ for all $n \in \{0,\ldots,K\}$.

Figures (17)

  • Figure 1: Comparison of dual-ISL, standard ISL, and MMD-GAN for modeling a mixture of Pareto and Normal distributions. Subfigure \ref{['figure:dual-isl vs isl vs mmd-gan mixture normal pareto:dual-ISL']} displays the dual-ISL results, Subfigure \ref{['figure:dual-isl vs isl vs mmd-gan mixture normal pareto:isl']} illustrates the performance of the standard ISL approach, and Subfigure \ref{['figure:dual-isl vs isl vs mmd-gan mixture normal pareto:mmd-gan']} showcases the outcomes obtained via MMD-GAN. Further comparisons—including diffusion models and additional target distributions—are provided in Appendix \ref{['Supplementary experiments']}
  • Figure 2: Empirical convergence of ISL’s Bernstein approximation (cf. Eq. \ref{['eq:convergence of qK to 1']}). The solid blue curve shows the mean theoretical upper bound $\|q_K - 1\|_\infty\le (K+1)^2d_K$, and the dashed red curve shows the observed $\|q - 1\|_\infty$.
  • Figure 3: Dual-ISL density estimation results. (a) On a 1D Cauchy target, dual-ISL (blue) closely matches the true density (red) and outperforms the KDE baseline (green). (b) On a 2D two-moons dataset, dual-ISL accurately captures the manifold structure, with learned contours aligning tightly with the sample cloud.
  • Figure 4: One‐dimensional density estimation across six benchmark targets. Each row corresponds to a different true distribution (top to bottom: $\mathcal{N}(4,2)$, Cauchy$(1,2)$, Pareto$(1,1)$, Model$_1$, Model$_2$, Model$_3$). In each subplot, the red curve shows the ground‐truth density and the blue curve shows the model’s estimated density. Columns (left to right) compare dual‐ISL, classical ISL, WGAN, MMD‐GAN, and a DDPM diffusion baseline, respectively. Dual‐ISL more accurately captures multi‐modal and heavy‐tailed shapes, with reduced mode‐collapse and smoother estimates.
  • Figure 5: Comparison of learned generator mappings $f_{\theta}(z)$ against the true probability‐integral‐transform $f_{+}(z)$ or $f_{-}(z)$ for Model$_3$. Dual-ISL closely follows the ideal map even in the heavy‐tailed region, while ISL and MMD-GAN display growing errors, particularly near the mode boundaries and in the Pareto tail.
  • ...and 12 more figures

Theorems & Definitions (34)

  • Theorem 2.1
  • Theorem 2.2: Continuity
  • Theorem 2.3: Identifiability
  • Theorem 3.1: Continuity under weak convergence
  • proof
  • Theorem 3.2: Convexity
  • proof
  • Definition 4.1: Binomial mapping
  • Theorem 4.1
  • proof
  • ...and 24 more