Importance Weighted Score Matching for Diffusion Samplers with Enhanced Mode Coverage
Chenguang Wang, Xiaoyu Zhang, Kaiyuan Cui, Weichen Zhao, Yongtao Guan, Tianshu Yu
TL;DR
The paper tackles the challenge of sampling from unnormalized densities without target samples, aiming for comprehensive mode coverage. It introduces Importance Weighted Score Matching (IWSM) to optimize a forward-KL–like objective for diffusion samplers by re-weighting the score-matching loss with self-normalized importance-sampling estimates, using a sampler-induced proposal built from a replay buffer. The authors provide bias-variance analyses for the score estimator and SNIS weights, and validate the method on increasingly complex multi-modal distributions, including high-dimensional particle systems, where it achieves state-of-the-art performance across multiple distributional metrics. The approach demonstrates strong mode coverage, computational efficiency via amortized score estimation, and practical viability for data-free energies, with clear avenues for scaling and acceleration in future work.
Abstract
Training neural samplers directly from unnormalized densities without access to target distribution samples presents a significant challenge. A critical desideratum in these settings is achieving comprehensive mode coverage, ensuring the sampler captures the full diversity of the target distribution. However, prevailing methods often circumvent the lack of target data by optimizing reverse KL-based objectives. Such objectives inherently exhibit mode-seeking behavior, potentially leading to incomplete representation of the underlying distribution. While alternative approaches strive for better mode coverage, they typically rely on implicit mechanisms like heuristics or iterative refinement. In this work, we propose a principled approach for training diffusion-based samplers by directly targeting an objective analogous to the forward KL divergence, which is conceptually known to encourage mode coverage. We introduce \textit{Importance Weighted Score Matching}, a method that optimizes this desired mode-covering objective by re-weighting the score matching loss using tractable importance sampling estimates, thereby overcoming the absence of target distribution data. We also provide theoretical analysis of the bias and variance for our proposed Monte Carlo estimator and the practical loss function used in our method. Experiments on increasingly complex multi-modal distributions, including 2D Gaussian Mixture Models with up to 120 modes and challenging particle systems with inherent symmetries -- demonstrate that our approach consistently outperforms existing neural samplers across all distributional distance metrics, achieving state-of-the-art results on all benchmarks.
