Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling
Will Grathwohl, Kuan-Chieh Wang, Jorn-Henrik Jacobsen, David Duvenaud, Richard Zemel
TL;DR
This paper introduces LSD, a neural-critic-based Learned Stein Discrepancy, to directly compare data density p with an unnormalized model q without sampling. By leveraging Stein's identity and an efficiently estimable objective via Hutchinson’s trick, LSD enables both goodness-of-fit testing and training of energy-based models at high dimensionality. The authors demonstrate that LSD matches or outperforms kernel-based Stein methods in GoF and model evaluation, and enables sampler-free training of EBMs that scales to complex densities, including RBMs, ICA, and deep flows. The approach provides a unified, scalable framework for evaluating and learning unnormalized densities with practical impact on robustness, calibration, and high-dimensional density modeling.
Abstract
We present a new method for evaluating and training unnormalized density models. Our approach only requires access to the gradient of the unnormalized model's log-density. We estimate the Stein discrepancy between the data density $p(x)$ and the model density $q(x)$ defined by a vector function of the data. We parameterize this function with a neural network and fit its parameters to maximize the discrepancy. This yields a novel goodness-of-fit test which outperforms existing methods on high dimensional data. Furthermore, optimizing $q(x)$ to minimize this discrepancy produces a novel method for training unnormalized models which scales more gracefully than existing methods. The ability to both learn and compare models is a unique feature of the proposed method.
