Optimal Stochastic Trace Estimation in Generative Modeling
Xinyang Liu, Hengrong Du, Wei Deng, Ruqi Zhang
TL;DR
The paper tackles the high-variance challenge of Hutchinson trace estimation in OT-guided generative modeling and diffusion-based objectives. It introduces Hutch++, a variance-reduced trace estimator that splits the trace into a exact large-eigenvalue component and a stochastic remainder, with an acceleration scheme that reuses the top eigenvectors via QR factorizations amortized over time. Theoretical analysis provides unbiasedness and variance bounds, plus complexity considerations, showing substantial variance reductions over the vanilla Hutchinson estimator. Empirically, Hutch++ improves training efficiency and generation quality across neural ODE-based models, Schrödinger-bridge diffusion, time-series tasks, and image generation, demonstrating scalable OT guarantees with high-dimensional data. The approach enables faster, more accurate transport maps in diverse generative settings and holds potential for broad applicability in OT-based learning and simulations.
Abstract
Hutchinson estimators are widely employed in training divergence-based likelihoods for diffusion models to ensure optimal transport (OT) properties. However, this estimator often suffers from high variance and scalability concerns. To address these challenges, we investigate Hutch++, an optimal stochastic trace estimator for generative models, designed to minimize training variance while maintaining transport optimality. Hutch++ is particularly effective for handling ill-conditioned matrices with large condition numbers, which commonly arise when high-dimensional data exhibits a low-dimensional structure. To mitigate the need for frequent and costly QR decompositions, we propose practical schemes that balance frequency and accuracy, backed by theoretical guarantees. Our analysis demonstrates that Hutch++ leads to generations of higher quality. Furthermore, this method exhibits effective variance reduction in various applications, including simulations, conditional time series forecasts, and image generation.
