Minimizing $f$-Divergences by Interpolating Velocity Fields
Song Liu, Jiahao Yu, Jack Simons, Mingxuan Yi, Mark Beaumont
TL;DR
This work addresses minimizing $f$-divergences between a target distribution and a particle approximation via Wasserstein Gradient Flows, where the velocity field depends on the density ratio $r_t = p/q_t$ and is typically unknown. It introduces direct velocity-field estimation by interpolation: (i) NW interpolation for backward KL when $\nabla \log p$ is available, connecting to SVGD, and (ii) a general local linear interpolation framed by a mirror-divergence variational objective to handle general $f$-divergences using samples alone. The authors prove consistency and derive convergence rates for the estimators, and demonstrate their effectiveness through domain adaptation and missing data imputation tasks, often outperforming density-ratio-based approaches or baselines. The approach enables robust WGF with only samples, provides a unified mirror-divergence framework, and offers practical model-selection strategies, though challenges remain for high-dimensional data and non-overlapping supports.
Abstract
Many machine learning problems can be seen as approximating a \textit{target} distribution using a \textit{particle} distribution by minimizing their statistical discrepancy. Wasserstein Gradient Flow can move particles along a path that minimizes the $f$-divergence between the target and particle distributions. To move particles, we need to calculate the corresponding velocity fields derived from a density ratio function between these two distributions. Previous works estimated such density ratio functions and then differentiated the estimated ratios. These approaches may suffer from overfitting, leading to a less accurate estimate of the velocity fields. Inspired by non-parametric curve fitting, we directly estimate these velocity fields using interpolation techniques. We prove that our estimators are consistent under mild conditions. We validate their effectiveness using novel applications on domain adaptation and missing data imputation.
