Minimizing $f$-Divergences by Interpolating Velocity Fields

Song Liu; Jiahao Yu; Jack Simons; Mingxuan Yi; Mark Beaumont

Minimizing $f$-Divergences by Interpolating Velocity Fields

Song Liu, Jiahao Yu, Jack Simons, Mingxuan Yi, Mark Beaumont

TL;DR

This work addresses minimizing $f$-divergences between a target distribution and a particle approximation via Wasserstein Gradient Flows, where the velocity field depends on the density ratio $r_t = p/q_t$ and is typically unknown. It introduces direct velocity-field estimation by interpolation: (i) NW interpolation for backward KL when $\nabla \log p$ is available, connecting to SVGD, and (ii) a general local linear interpolation framed by a mirror-divergence variational objective to handle general $f$-divergences using samples alone. The authors prove consistency and derive convergence rates for the estimators, and demonstrate their effectiveness through domain adaptation and missing data imputation tasks, often outperforming density-ratio-based approaches or baselines. The approach enables robust WGF with only samples, provides a unified mirror-divergence framework, and offers practical model-selection strategies, though challenges remain for high-dimensional data and non-overlapping supports.

Abstract

Many machine learning problems can be seen as approximating a \textit{target} distribution using a \textit{particle} distribution by minimizing their statistical discrepancy. Wasserstein Gradient Flow can move particles along a path that minimizes the $f$-divergence between the target and particle distributions. To move particles, we need to calculate the corresponding velocity fields derived from a density ratio function between these two distributions. Previous works estimated such density ratio functions and then differentiated the estimated ratios. These approaches may suffer from overfitting, leading to a less accurate estimate of the velocity fields. Inspired by non-parametric curve fitting, we directly estimate these velocity fields using interpolation techniques. We prove that our estimators are consistent under mild conditions. We validate their effectiveness using novel applications on domain adaptation and missing data imputation.

Minimizing $f$-Divergences by Interpolating Velocity Fields

TL;DR

This work addresses minimizing

-divergences between a target distribution and a particle approximation via Wasserstein Gradient Flows, where the velocity field depends on the density ratio

and is typically unknown. It introduces direct velocity-field estimation by interpolation: (i) NW interpolation for backward KL when

is available, connecting to SVGD, and (ii) a general local linear interpolation framed by a mirror-divergence variational objective to handle general

-divergences using samples alone. The authors prove consistency and derive convergence rates for the estimators, and demonstrate their effectiveness through domain adaptation and missing data imputation tasks, often outperforming density-ratio-based approaches or baselines. The approach enables robust WGF with only samples, provides a unified mirror-divergence framework, and offers practical model-selection strategies, though challenges remain for high-dimensional data and non-overlapping supports.

Abstract

-divergence between the target and particle distributions. To move particles, we need to calculate the corresponding velocity fields derived from a density ratio function between these two distributions. Previous works estimated such density ratio functions and then differentiated the estimated ratios. These approaches may suffer from overfitting, leading to a less accurate estimate of the velocity fields. Inspired by non-parametric curve fitting, we directly estimate these velocity fields using interpolation techniques. We prove that our estimators are consistent under mild conditions. We validate their effectiveness using novel applications on domain adaptation and missing data imputation.

Paper Structure (42 sections, 11 theorems, 76 equations, 12 figures, 2 tables, 1 algorithm)

This paper contains 42 sections, 11 theorems, 76 equations, 12 figures, 2 tables, 1 algorithm.

Introduction
Background
Wasserstein Gradient Flows of $f$-divergence
Direct Velocity Field Estimation by Interpolation
Nadaraya-Watson (NW) Interpolation of Backward KL Velocity Field
Effectiveness of NW Estimator
Velocity Field Interpolation from Samples
Mirror Divergence
Gradient Estimator using Local Linear Interpolation
Effectiveness of Local Linear Interpolation
Model Selection via Local Linear Interpolation
Experiments
Reducing KL Divergence: SVGD vs. NW vs. Local Linear Estimator
Joint Domain Adaptation
Missing Data Imputation
...and 27 more sections

Key Result

Theorem 2.1

The Wasserstein gradient flow of $D_f[p, q_t]$ characterizes the particle evolution via the ODE:

Figures (12)

Figure 1: Estimating a log density ratio $\log r$ using a flexible model (RBF kernel) leads to a overfitted estimate ($\log r_1$). The overfitting consequently causes huge fluctuations in the derivative $(\log r_1)'$. Our proposed method provides a much more stable estimate $\log r_2$ and a more accurate estimate of $(\log r_2)'$.
Figure 2: Particle Trajectories of SVGD, SVGD with AdaGrad, NW, LL. Approximated $\mathrm{KL}[q_t, p]$ with different methods.
Figure 3: Left: the source classifier (represented by colored areas) misclassifies many testing points (colored dots). Middle: WGF moves particles to align the source and target samples. Lines are trajectories of sample movements in each class. Right: the retrained classifier on the transported source samples gives a much better prediction.
Figure 4: Comparison of imputation methods. Fully observed samples are plotted in blue, and imputed samples in red. The leftmost plot shows the initial particles in the WGF impute. The second left plot visualizes the imputation trajectories of different particles. The third left plot is the final output after 100 WGF iterations.
Figure 5: AUROC of a linear SVM classifier on the imputed Breast Cancer dataset. Base indicates the performance of a baseline imputer where we impute the missing values with Gaussian noises.
...and 7 more figures

Theorems & Definitions (23)

Theorem 2.1: Corollary 3.3 in yi2023monoflow
Proposition 3.1
Definition 4.1
Example 4.2
Example 4.3
Theorem 4.8
Corollary 4.9
Corollary 4.10
proof
Lemma 1.1
...and 13 more

Minimizing $f$-Divergences by Interpolating Velocity Fields

TL;DR

Abstract

Minimizing $f$-Divergences by Interpolating Velocity Fields

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (23)