Table of Contents
Fetching ...

Importance Sampling via Score-based Generative Models

Heasung Kim, Taekyun Lee, Hyeji Kim, Gustavo de Veciana

TL;DR

This work introduces a training-free importance sampling framework that leverages a pretrained score-based diffusion model for a base distribution $p(\mathbf{x})$ to sample from $q(\mathbf{x}) \propto l(\mathbf{x}) p(\mathbf{x})$ for arbitrary differentiable weights $l(\mathbf{x})$. The core idea replaces the intractable $\nabla_\mathbf{x} \log q_t(\mathbf{x})$ with a tractable approximation that combines $\nabla_\mathbf{x} \log p_t(\mathbf{x})$ and a Tweedie-based correction involving $l(\mathbf{x})$, enabling a backward SDE that generates importance samples without retraining. The authors prove an upper bound on the score-gap between the true and approximated distributions that vanishes as time $t$ approaches zero under mild smoothness assumptions. Empirically, ISSGM achieves competitive or superior performance on synthetic manifolds and real-image tasks, including neural classifier weighting and high-frequency sampling, while maintaining the training-free property. The approach promises practical benefits for adaptive sampling, bias mitigation, and model interpretation by enabling flexible, data-efficient importance sampling with a single base distribution.

Abstract

Importance sampling, which involves sampling from a probability density function (PDF) proportional to the product of an importance weight function and a base PDF, is a powerful technique with applications in variance reduction, biased or customized sampling, data augmentation, and beyond. Inspired by the growing availability of score-based generative models (SGMs), we propose an entirely training-free Importance sampling framework that relies solely on an SGM for the base PDF. Our key innovation is realizing the importance sampling process as a backward diffusion process, expressed in terms of the score function of the base PDF and the specified importance weight function--both readily available--eliminating the need for any additional training. We conduct a thorough analysis demonstrating the method's scalability and effectiveness across diverse datasets and tasks, including importance sampling for industrial and natural images with neural importance weight functions. The training-free aspect of our method is particularly compelling in real-world scenarios where a single base distribution underlies multiple biased sampling tasks, each requiring a different importance weight function. To the best of our knowledge our approach is the first importance sampling framework to achieve this.

Importance Sampling via Score-based Generative Models

TL;DR

This work introduces a training-free importance sampling framework that leverages a pretrained score-based diffusion model for a base distribution to sample from for arbitrary differentiable weights . The core idea replaces the intractable with a tractable approximation that combines and a Tweedie-based correction involving , enabling a backward SDE that generates importance samples without retraining. The authors prove an upper bound on the score-gap between the true and approximated distributions that vanishes as time approaches zero under mild smoothness assumptions. Empirically, ISSGM achieves competitive or superior performance on synthetic manifolds and real-image tasks, including neural classifier weighting and high-frequency sampling, while maintaining the training-free property. The approach promises practical benefits for adaptive sampling, bias mitigation, and model interpretation by enabling flexible, data-efficient importance sampling with a single base distribution.

Abstract

Importance sampling, which involves sampling from a probability density function (PDF) proportional to the product of an importance weight function and a base PDF, is a powerful technique with applications in variance reduction, biased or customized sampling, data augmentation, and beyond. Inspired by the growing availability of score-based generative models (SGMs), we propose an entirely training-free Importance sampling framework that relies solely on an SGM for the base PDF. Our key innovation is realizing the importance sampling process as a backward diffusion process, expressed in terms of the score function of the base PDF and the specified importance weight function--both readily available--eliminating the need for any additional training. We conduct a thorough analysis demonstrating the method's scalability and effectiveness across diverse datasets and tasks, including importance sampling for industrial and natural images with neural importance weight functions. The training-free aspect of our method is particularly compelling in real-world scenarios where a single base distribution underlies multiple biased sampling tasks, each requiring a different importance weight function. To the best of our knowledge our approach is the first importance sampling framework to achieve this.

Paper Structure

This paper contains 29 sections, 1 theorem, 18 equations, 5 figures, 2 tables.

Key Result

Theorem 1

(Score function gap) Suppose that Assumptions assumption_bounded_norm–assumption_lipschitz hold and for all ${\mathbf{x}} \in \mathbb{R}^{d}$ and $t\in[0,T]$, the importance weight function $l$ is approximately equal to its second-order Taylor expansion at the estimated mean as $l({\mathbf{X}}'_{0}) where $\lambda_{t}= \frac{1}{2m}(\eta_{F} + \gamma_{t}\eta_{2}) + \frac{ \left({1+(1-\bar{\alpha}(t

Figures (5)

  • Figure 1: Top: Sampling process for ${\mathbf{X}}' \sim p({\mathbf{x}})$. Bottom: Importance sampling process for ${\mathbf{X}} \sim q({\mathbf{x}})$. The proposed method utilizes the pretrained $\nabla_{{\mathbf{x}}} \log p_{t}({\mathbf{x}})$ and given importance weight function $l({\mathbf{x}})$ to compute $\nabla_{{\mathbf{x}}} \log q_{t}({\mathbf{x}})$without necessitating any additional training.
  • Figure 2: Top row: sampling process for ${\mathbf{X}}' \sim p({\mathbf{x}})$. Bottom row: proposed importance sampling process for ${\mathbf{X}} \sim q({\mathbf{x}})$. From left to right, each column corresponds to $t=500, 400, \ldots, 0$, showing the distributions of $\mathbb{E}[{\mathbf{X}}'_{0}|{\mathbf{X}}'_{t}]$ and $\mathbb{E}[{\mathbf{X}}_{0}|{\mathbf{X}}_{t}]$. Thus, the rightmost column illustrates the PDFs $p({\mathbf{x}})$ (top) and $q({\mathbf{x}})$ (bottom). The proposed approach enables efficient importance sampling on the correct spiral-shaped manifold without any additional training for $l({\mathbf{x}})$, selectively emphasizing instances with a high norm with $l({\mathbf{x}})=\Vert {\mathbf{x}} \Vert^{2}$.
  • Figure 3: Left: Histogram of $l({\mathbf{x}})$ (distortion) from $p({\mathbf{x}})$ and Right: the importance sampling PDF $q({\mathbf{x}})$ which assigns higher weights to instances with high distortion in the neural compressor. This allows rare features to be more readily observed by increasing the likelihood of sampling high-distortion instances.
  • Figure 4: Can a neural classifier serve as an importance weight in a completely training-free manner for SGM which is never trained with class information? - YES. Our method can use any external differentiable importance weight function, e.g., a neural gender classifier.
  • Figure 5: First row: Samples from $p({\mathbf{x}})$, Second row: samples generated from $q({\mathbf{x}})$. Our approach can generate samples containing elevated high-frequency components via setting $l({\mathbf{x}})$ accordingly.

Theorems & Definitions (5)

  • Definition 1
  • Remark 1: Training-Free Importance Sampling
  • Remark 2: High Scalability and Applicability
  • Theorem 1
  • Remark 3: Bounded score function gap by variance