No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers
Jiajun He, Yuanqi Du, Francisco Vargas, Dinghuai Zhang, Shreyas Padhy, RuiKang OuYang, Carla Gomes, José Miguel Hernández-Lobato
TL;DR
The paper addresses sampling from an unnormalized target density $p_\text{target}$ and investigates whether neural samplers can be trained without trajectory simulations. It provides a systematic review of diffusion- and control-based samplers, introducing a simulation-free approach using a time-dependent normalizing flow (NF-DDS) and analyzing its limitations, especially in the absence of Langevin preconditioning. Ablation studies across DDS, CMCD, and PINN reveal that explicit gradient information via $\nabla \log p_\text{target}$ or its surrogate is crucial to avoid mode collapse, with PINN showing some robustness but at higher computational cost. Benchmarking against a strong baseline formed by Parallel Tempering (PT) plus a generative model, the work concludes that current neural samplers lag behind traditional MCMC pipelines in practical efficiency, and suggests data-informed or warm-start strategies as promising directions for future work.
Abstract
We consider the sampling problem, where the aim is to draw samples from a distribution whose density is known only up to a normalization constant. Recent breakthroughs in generative modeling to approximate a high-dimensional data distribution have sparked significant interest in developing neural network-based methods for this challenging problem. However, neural samplers typically incur heavy computational overhead due to simulating trajectories during training. This motivates the pursuit of simulation-free training procedures of neural samplers. In this work, we propose an elegant modification to previous methods, which allows simulation-free training with the help of a time-dependent normalizing flow. However, it ultimately suffers from severe mode collapse. On closer inspection, we find that nearly all successful neural samplers rely on Langevin preconditioning to avoid mode collapsing. We systematically analyze several popular methods with various objective functions and demonstrate that, in the absence of Langevin preconditioning, most of them fail to adequately cover even a simple target. Finally, we draw attention to a strong baseline by combining the state-of-the-art MCMC method, Parallel Tempering (PT), with an additional generative model to shed light on future explorations of neural samplers.
