Table of Contents
Fetching ...

No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers

Jiajun He, Yuanqi Du, Francisco Vargas, Dinghuai Zhang, Shreyas Padhy, RuiKang OuYang, Carla Gomes, José Miguel Hernández-Lobato

TL;DR

The paper addresses sampling from an unnormalized target density $p_\text{target}$ and investigates whether neural samplers can be trained without trajectory simulations. It provides a systematic review of diffusion- and control-based samplers, introducing a simulation-free approach using a time-dependent normalizing flow (NF-DDS) and analyzing its limitations, especially in the absence of Langevin preconditioning. Ablation studies across DDS, CMCD, and PINN reveal that explicit gradient information via $\nabla \log p_\text{target}$ or its surrogate is crucial to avoid mode collapse, with PINN showing some robustness but at higher computational cost. Benchmarking against a strong baseline formed by Parallel Tempering (PT) plus a generative model, the work concludes that current neural samplers lag behind traditional MCMC pipelines in practical efficiency, and suggests data-informed or warm-start strategies as promising directions for future work.

Abstract

We consider the sampling problem, where the aim is to draw samples from a distribution whose density is known only up to a normalization constant. Recent breakthroughs in generative modeling to approximate a high-dimensional data distribution have sparked significant interest in developing neural network-based methods for this challenging problem. However, neural samplers typically incur heavy computational overhead due to simulating trajectories during training. This motivates the pursuit of simulation-free training procedures of neural samplers. In this work, we propose an elegant modification to previous methods, which allows simulation-free training with the help of a time-dependent normalizing flow. However, it ultimately suffers from severe mode collapse. On closer inspection, we find that nearly all successful neural samplers rely on Langevin preconditioning to avoid mode collapsing. We systematically analyze several popular methods with various objective functions and demonstrate that, in the absence of Langevin preconditioning, most of them fail to adequately cover even a simple target. Finally, we draw attention to a strong baseline by combining the state-of-the-art MCMC method, Parallel Tempering (PT), with an additional generative model to shed light on future explorations of neural samplers.

No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers

TL;DR

The paper addresses sampling from an unnormalized target density and investigates whether neural samplers can be trained without trajectory simulations. It provides a systematic review of diffusion- and control-based samplers, introducing a simulation-free approach using a time-dependent normalizing flow (NF-DDS) and analyzing its limitations, especially in the absence of Langevin preconditioning. Ablation studies across DDS, CMCD, and PINN reveal that explicit gradient information via or its surrogate is crucial to avoid mode collapse, with PINN showing some robustness but at higher computational cost. Benchmarking against a strong baseline formed by Parallel Tempering (PT) plus a generative model, the work concludes that current neural samplers lag behind traditional MCMC pipelines in practical efficiency, and suggests data-informed or warm-start strategies as promising directions for future work.

Abstract

We consider the sampling problem, where the aim is to draw samples from a distribution whose density is known only up to a normalization constant. Recent breakthroughs in generative modeling to approximate a high-dimensional data distribution have sparked significant interest in developing neural network-based methods for this challenging problem. However, neural samplers typically incur heavy computational overhead due to simulating trajectories during training. This motivates the pursuit of simulation-free training procedures of neural samplers. In this work, we propose an elegant modification to previous methods, which allows simulation-free training with the help of a time-dependent normalizing flow. However, it ultimately suffers from severe mode collapse. On closer inspection, we find that nearly all successful neural samplers rely on Langevin preconditioning to avoid mode collapsing. We systematically analyze several popular methods with various objective functions and demonstrate that, in the absence of Langevin preconditioning, most of them fail to adequately cover even a simple target. Finally, we draw attention to a strong baseline by combining the state-of-the-art MCMC method, Parallel Tempering (PT), with an additional generative model to shed light on future explorations of neural samplers.

Paper Structure

This paper contains 18 sections, 36 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Sample quality vs target evaluation times for different approaches with different objectives on GMM-40 target. *NETS uses mode interpolation, which is distinct from that employed in others.
  • Figure 2: Sampled obtained by PINN with different settings. We can see PINN seems to be highly robust to Langevin preconditioning. However, it is highly sensitive to the prior and interpolation.
  • Figure 3: Sampled obtained by DDS with different settings. The first line shows the initialization.
  • Figure 4: Sampled obtained by CMCD with different settings. The first line shows the initialization and N/A indicates diverging. We can see when trained with Langevin preconditioning, we can see that CMCD already captures modes after initialization.
  • Figure 5: Sample quality (MMD) by NETS trained with PINN loss albergo2024nets, both with and without LG in the simulation process during training. As NETS used a different prior and interpolation ($\mathcal{N}(0, 2I)$, mode interpolation) compared to CMCD ($\mathcal{N}(0, 30^2I)$, geometric interpolation), we present the results by both settings for a fair investigation. N/A suggests diverging.
  • ...and 1 more figures