Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence
José Manuel de Frutos, Manuel A. Vázquez, Pablo M. Olmos, Joaquín Míguez
TL;DR
This work introduces dual-ISL, a convex, likelihood-free objective for training implicit generative processes by swapping the data and model roles in ISL. It establishes that the resulting discrepancy $d_K(p,\tilde p)$ is continuous under weak convergence and convex in the model density, enabling stable optimization in density space. A central contribution is interpreting $d_K$ as the $L^2$ projection of the density ratio $q= p/\tilde p$ onto Bernstein polynomials, yielding an explicit, tractable closed-form density approximation $p_K$ with sharp error bounds and convergence rates. The framework extends naturally to sliced multivariate settings, preserving convexity, and empirical results show faster convergence, smoother training, and better mode coverage than standard ISL and competing generative methods, while providing explicit density estimates for downstream tasks.
Abstract
Rank-based statistical metrics, such as the invariant statistical loss (ISL), have recently emerged as robust and practically effective tools for training implicit generative models. In this work, we introduce dual-ISL, a novel likelihood-free objective for training implicit generative models that interchanges the roles of the target and model distributions in the ISL framework, yielding a convex optimization problem in the space of model densities. We prove that the resulting rank-based discrepancy $d_K$ is i) continuous under weak convergence and with respect to the $L^1$ norm, and ii) convex in its first argument-properties not shared by classical divergences such as KL or Wasserstein distances. Building on this, we develop a theoretical framework that interprets $d_K$ as an $L^2$-projection of the density ratio $q = p/\tilde p$ onto a Bernstein polynomial basis, from which we derive exact bounds on the truncation error, precise convergence rates, and a closed-form expression for the truncated density approximation. We further extend our analysis to the multivariate setting via random one-dimensional projections, defining a sliced dual-ISL divergence that retains both convexity and continuity. We empirically show that these theoretical advantages translate into practical ones. Specifically, across several benchmarks dual-ISL converges more rapidly, delivers markedly smoother and more stable training, and more effectively prevents mode collapse than classical ISL and other leading implicit generative methods-while also providing an explicit density approximation.
