Fast and Unified Path Gradient Estimators for Normalizing Flows
Lorenz Vaitl, Ludwig Winkler, Lorenz Richter, Pan Kessel
TL;DR
The paper addresses the efficiency barrier of path gradient estimators for normalizing flows by introducing fast, unified path gradient estimators that work across practical NF architectures. It derives a recursive forward-pass formulation to compute the necessary path-score derivatives, enabling efficient estimation for both coupling and implicitly invertible flows, with linear-time complexity in the coupling case and constant-memory usage. By leveraging a pullback perspective, the forward KL gradient is expressed as a reverse KL-like path gradient, allowing fast, low-variance maximum-likelihood training that can incorporate a target energy function as regularization. Empirical results on Gaussian mixtures and lattice gauge theories demonstrate reduced variance and improved convergence, while runtime analyses show significant speedups over prior methods, broadening the applicability of path-gradient NF training in physics and ML contexts.
Abstract
Recent work shows that path gradient estimators for normalizing flows have lower variance compared to standard estimators for variational inference, resulting in improved training. However, they are often prohibitively more expensive from a computational point of view and cannot be applied to maximum likelihood training in a scalable manner, which severely hinders their widespread adoption. In this work, we overcome these crucial limitations. Specifically, we propose a fast path gradient estimator which improves computational efficiency significantly and works for all normalizing flow architectures of practical relevance. We then show that this estimator can also be applied to maximum likelihood training for which it has a regularizing effect as it can take the form of a given target energy function into account. We empirically establish its superior performance and reduced variance for several natural sciences applications.
