Table of Contents
Fetching ...

Minimizing $f$-Divergences by Interpolating Velocity Fields

Song Liu, Jiahao Yu, Jack Simons, Mingxuan Yi, Mark Beaumont

TL;DR

This work addresses minimizing $f$-divergences between a target distribution and a particle approximation via Wasserstein Gradient Flows, where the velocity field depends on the density ratio $r_t = p/q_t$ and is typically unknown. It introduces direct velocity-field estimation by interpolation: (i) NW interpolation for backward KL when $\nabla \log p$ is available, connecting to SVGD, and (ii) a general local linear interpolation framed by a mirror-divergence variational objective to handle general $f$-divergences using samples alone. The authors prove consistency and derive convergence rates for the estimators, and demonstrate their effectiveness through domain adaptation and missing data imputation tasks, often outperforming density-ratio-based approaches or baselines. The approach enables robust WGF with only samples, provides a unified mirror-divergence framework, and offers practical model-selection strategies, though challenges remain for high-dimensional data and non-overlapping supports.

Abstract

Many machine learning problems can be seen as approximating a \textit{target} distribution using a \textit{particle} distribution by minimizing their statistical discrepancy. Wasserstein Gradient Flow can move particles along a path that minimizes the $f$-divergence between the target and particle distributions. To move particles, we need to calculate the corresponding velocity fields derived from a density ratio function between these two distributions. Previous works estimated such density ratio functions and then differentiated the estimated ratios. These approaches may suffer from overfitting, leading to a less accurate estimate of the velocity fields. Inspired by non-parametric curve fitting, we directly estimate these velocity fields using interpolation techniques. We prove that our estimators are consistent under mild conditions. We validate their effectiveness using novel applications on domain adaptation and missing data imputation.

Minimizing $f$-Divergences by Interpolating Velocity Fields

TL;DR

This work addresses minimizing -divergences between a target distribution and a particle approximation via Wasserstein Gradient Flows, where the velocity field depends on the density ratio and is typically unknown. It introduces direct velocity-field estimation by interpolation: (i) NW interpolation for backward KL when is available, connecting to SVGD, and (ii) a general local linear interpolation framed by a mirror-divergence variational objective to handle general -divergences using samples alone. The authors prove consistency and derive convergence rates for the estimators, and demonstrate their effectiveness through domain adaptation and missing data imputation tasks, often outperforming density-ratio-based approaches or baselines. The approach enables robust WGF with only samples, provides a unified mirror-divergence framework, and offers practical model-selection strategies, though challenges remain for high-dimensional data and non-overlapping supports.

Abstract

Many machine learning problems can be seen as approximating a \textit{target} distribution using a \textit{particle} distribution by minimizing their statistical discrepancy. Wasserstein Gradient Flow can move particles along a path that minimizes the -divergence between the target and particle distributions. To move particles, we need to calculate the corresponding velocity fields derived from a density ratio function between these two distributions. Previous works estimated such density ratio functions and then differentiated the estimated ratios. These approaches may suffer from overfitting, leading to a less accurate estimate of the velocity fields. Inspired by non-parametric curve fitting, we directly estimate these velocity fields using interpolation techniques. We prove that our estimators are consistent under mild conditions. We validate their effectiveness using novel applications on domain adaptation and missing data imputation.
Paper Structure (42 sections, 11 theorems, 76 equations, 12 figures, 2 tables, 1 algorithm)

This paper contains 42 sections, 11 theorems, 76 equations, 12 figures, 2 tables, 1 algorithm.

Key Result

Theorem 2.1

The Wasserstein gradient flow of $D_f[p, q_t]$ characterizes the particle evolution via the ODE:

Figures (12)

  • Figure 1: Estimating a log density ratio $\log r$ using a flexible model (RBF kernel) leads to a overfitted estimate ($\log r_1$). The overfitting consequently causes huge fluctuations in the derivative $(\log r_1)'$. Our proposed method provides a much more stable estimate $\log r_2$ and a more accurate estimate of $(\log r_2)'$.
  • Figure 2: Particle Trajectories of SVGD, SVGD with AdaGrad, NW, LL. Approximated $\mathrm{KL}[q_t, p]$ with different methods.
  • Figure 3: Left: the source classifier (represented by colored areas) misclassifies many testing points (colored dots). Middle: WGF moves particles to align the source and target samples. Lines are trajectories of sample movements in each class. Right: the retrained classifier on the transported source samples gives a much better prediction.
  • Figure 4: Comparison of imputation methods. Fully observed samples are plotted in blue, and imputed samples in red. The leftmost plot shows the initial particles in the WGF impute. The second left plot visualizes the imputation trajectories of different particles. The third left plot is the final output after 100 WGF iterations.
  • Figure 5: AUROC of a linear SVM classifier on the imputed Breast Cancer dataset. Base indicates the performance of a baseline imputer where we impute the missing values with Gaussian noises.
  • ...and 7 more figures

Theorems & Definitions (23)

  • Theorem 2.1: Corollary 3.3 in yi2023monoflow
  • Proposition 3.1
  • Definition 4.1
  • Example 4.2
  • Example 4.3
  • Theorem 4.8
  • Corollary 4.9
  • Corollary 4.10
  • proof
  • Lemma 1.1
  • ...and 13 more