Table of Contents
Fetching ...

Denoising Fisher Training For Neural Implicit Samplers

Weijian Luo, Wei Deng

TL;DR

This paper introduces Denoising Fisher Training (DFT), a novel training approach for neural implicit samplers with theoretical guarantees that is empirically validated across diverse sampling benchmarks, including two-dimensional synthetic distribution, Bayesian logistic regression, and high-dimensional energy-based models (EBMs).

Abstract

Efficient sampling from un-normalized target distributions is pivotal in scientific computing and machine learning. While neural samplers have demonstrated potential with a special emphasis on sampling efficiency, existing neural implicit samplers still have issues such as poor mode covering behavior, unstable training dynamics, and sub-optimal performances. To tackle these issues, in this paper, we introduce Denoising Fisher Training (DFT), a novel training approach for neural implicit samplers with theoretical guarantees. We frame the training problem as an objective of minimizing the Fisher divergence by deriving a tractable yet equivalent loss function, which marks a unique theoretical contribution to assessing the intractable Fisher divergences. DFT is empirically validated across diverse sampling benchmarks, including two-dimensional synthetic distribution, Bayesian logistic regression, and high-dimensional energy-based models (EBMs). Notably, in experiments with high-dimensional EBMs, our best one-step DFT neural sampler achieves results on par with MCMC methods with up to 200 sampling steps, leading to a substantially greater efficiency over 100 times higher. This result not only demonstrates the superior performance of DFT in handling complex high-dimensional sampling but also sheds light on efficient sampling methodologies across broader applications.

Denoising Fisher Training For Neural Implicit Samplers

TL;DR

This paper introduces Denoising Fisher Training (DFT), a novel training approach for neural implicit samplers with theoretical guarantees that is empirically validated across diverse sampling benchmarks, including two-dimensional synthetic distribution, Bayesian logistic regression, and high-dimensional energy-based models (EBMs).

Abstract

Efficient sampling from un-normalized target distributions is pivotal in scientific computing and machine learning. While neural samplers have demonstrated potential with a special emphasis on sampling efficiency, existing neural implicit samplers still have issues such as poor mode covering behavior, unstable training dynamics, and sub-optimal performances. To tackle these issues, in this paper, we introduce Denoising Fisher Training (DFT), a novel training approach for neural implicit samplers with theoretical guarantees. We frame the training problem as an objective of minimizing the Fisher divergence by deriving a tractable yet equivalent loss function, which marks a unique theoretical contribution to assessing the intractable Fisher divergences. DFT is empirically validated across diverse sampling benchmarks, including two-dimensional synthetic distribution, Bayesian logistic regression, and high-dimensional energy-based models (EBMs). Notably, in experiments with high-dimensional EBMs, our best one-step DFT neural sampler achieves results on par with MCMC methods with up to 200 sampling steps, leading to a substantially greater efficiency over 100 times higher. This result not only demonstrates the superior performance of DFT in handling complex high-dimensional sampling but also sheds light on efficient sampling methodologies across broader applications.

Paper Structure

This paper contains 39 sections, 2 theorems, 21 equations, 5 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

If distribution $p_{\theta, \sigma}$ satisfies some wild regularity conditions, then we have for all vector-valued score function $\bm{s}_{q}(.)$, the equation holds for all parameter $\theta$:

Figures (5)

  • Figure 1: Visualizations of DFT neural sampler against Stein Variational Gradient Decent liu2016stein with multiple sampling steps. The DFT-NS with 1 sampling step outperforms SVGD with 200 sampling steps. Upper: Donut target distribution in Table \ref{['tab:1']}; Under: Rosenbrock target distribution in Table \ref{['tab:1']}.
  • Figure 2: The comparison of KSD values of DFT-NS using full and partial gradients in equation \ref{['eqn:tFD_grad_full']}.
  • Figure 3: A visualization of DFT-NS on MNIST again default MCMC sampling result from DeepEBM. The one-step DFT-NS outperforms MCMC samplers with 200 sampling steps as shown in Table \ref{['tab:t5']}.
  • Figure 4: The test-accuracy curve of DFT-NS (ours) and KL-NS luo2024entropy for Bayesian Logistic Regression. Though KL-NS converges faster, DFT-NS shows better final accuracy than KL-NS with sufficient training iterations.
  • Figure 5: The Log-FID curve of one-step DFT-NS on MNIST image distribution. The blue dashed line marks the logarithm of FID of 200-step MCMC (Annealed Langevin dynamics) samples from pre-trained DeepEBM Li2019LearningEM.

Theorems & Definitions (5)

  • Theorem 1
  • proof
  • proof
  • Lemma 2
  • proof : Proof of Lemma \ref{['lemma:score_projectioin']}