Table of Contents
Fetching ...

MMD-Regularized Unbalanced Optimal Transport

Piyushi Manupriya, J. Saketha Nath, Pratik Jawanpuria

TL;DR

The paper develops MMD-regularized unbalanced optimal transport (MMD-UOT), deriving a dual that reveals MMD-UOT induces a new integral probability metric lifting the ground cost to measures and interpolating between Kantorovich and MMD metrics. It provides a finite-sample, finitely-supported transport plan estimator with $\mathcal{O}(m^{-1/2})$ rate, along with an efficient APGD-based solver and extensions to barycenters. The authors demonstrate consistent estimation and practical scalability, and show MMD-UOT consistently outperforms φ-divergence-based UOT and MMD baselines across hypothesis testing, domain adaptation, scRNA-seq interpolation, and prompt-learning tasks. These results establish MMD-UOT as a robust, sample-efficient alternative with strong theoretical properties and broad ML applicability.

Abstract

We study the unbalanced optimal transport (UOT) problem, where the marginal constraints are enforced using Maximum Mean Discrepancy (MMD) regularization. Our work is motivated by the observation that the literature on UOT is focused on regularization based on $φ$-divergence (e.g., KL divergence). Despite the popularity of MMD, its role as a regularizer in the context of UOT seems less understood. We begin by deriving a specific dual of MMD-regularized UOT (MMD-UOT), which helps us prove several useful properties. One interesting outcome of this duality result is that MMD-UOT induces novel metrics, which not only lift the ground metric like the Wasserstein but are also sample-wise efficient to estimate like the MMD. Further, for real-world applications involving non-discrete measures, we present an estimator for the transport plan that is supported only on the given ($m$) samples. Under certain conditions, we prove that the estimation error with this finitely-supported transport plan is also $\mathcal{O}(1/\sqrt{m})$. As far as we know, such error bounds that are free from the curse of dimensionality are not known for $φ$-divergence regularized UOT. Finally, we discuss how the proposed estimator can be computed efficiently using accelerated gradient descent. Our experiments show that MMD-UOT consistently outperforms popular baselines, including KL-regularized UOT and MMD, in diverse machine learning applications.

MMD-Regularized Unbalanced Optimal Transport

TL;DR

The paper develops MMD-regularized unbalanced optimal transport (MMD-UOT), deriving a dual that reveals MMD-UOT induces a new integral probability metric lifting the ground cost to measures and interpolating between Kantorovich and MMD metrics. It provides a finite-sample, finitely-supported transport plan estimator with rate, along with an efficient APGD-based solver and extensions to barycenters. The authors demonstrate consistent estimation and practical scalability, and show MMD-UOT consistently outperforms φ-divergence-based UOT and MMD baselines across hypothesis testing, domain adaptation, scRNA-seq interpolation, and prompt-learning tasks. These results establish MMD-UOT as a robust, sample-efficient alternative with strong theoretical properties and broad ML applicability.

Abstract

We study the unbalanced optimal transport (UOT) problem, where the marginal constraints are enforced using Maximum Mean Discrepancy (MMD) regularization. Our work is motivated by the observation that the literature on UOT is focused on regularization based on -divergence (e.g., KL divergence). Despite the popularity of MMD, its role as a regularizer in the context of UOT seems less understood. We begin by deriving a specific dual of MMD-regularized UOT (MMD-UOT), which helps us prove several useful properties. One interesting outcome of this duality result is that MMD-UOT induces novel metrics, which not only lift the ground metric like the Wasserstein but are also sample-wise efficient to estimate like the MMD. Further, for real-world applications involving non-discrete measures, we present an estimator for the transport plan that is supported only on the given () samples. Under certain conditions, we prove that the estimation error with this finitely-supported transport plan is also . As far as we know, such error bounds that are free from the curse of dimensionality are not known for -divergence regularized UOT. Finally, we discuss how the proposed estimator can be computed efficiently using accelerated gradient descent. Our experiments show that MMD-UOT consistently outperforms popular baselines, including KL-regularized UOT and MMD, in diverse machine learning applications.

Paper Structure

This paper contains 63 sections, 10 theorems, 67 equations, 10 figures, 11 tables, 2 algorithms.

Key Result

Theorem 4.1

(Duality) Whenever $c, k\in\mathcal{ C}(\mathcal{ X}\times\mathcal{ X})$ and $\mathcal{ X}$ is compact, we have that: Here, $\mathcal{ G}_k(\lambda)\equiv\{g\in \mathcal{ H}_k\ |\ \|g\|_k\le\lambda\}$.

Figures (10)

  • Figure 1: For illustration, the generating set of Kantorovich-Wasserstein is depicted as a triangle, and the scaled generating set of MMD is depicted as a disc. The intersection represents the generating set of the IPM metric induced by MMD-UOT. (a) shows the special case when our MMD-UOT metric recovers back the sample-efficient MMD metric, (b) shows the special case when our MMD-UOT metric reduces to the Kantorovich-Wasserstein metric that lifts the ground metric to measures, and (c) shows the resulting family of new UOT metrics which are both sample-efficient and can lift ground metrics to measures.
  • Figure 2: (a) Optimal Transport plans of $\epsilon$KL-UOT and MMD-UOT; (b) Barycenter interpolating between Gaussian measures. For the chosen hyperparameter, the barycenters of $\epsilon$KL-UOT and MMD-UOT overlap and can be looked as smooth approximations of the OT barycenter; (c) Objective vs Time plot comparing $\epsilon$KL-UOT solved using the popular Sinkhorn algorithm ChizatPSV18pham20a and MMD-UOT (\ref{['eqn:kernot']}) solved using APGD. A plot showing $\epsilon$KL-UOT's progress at the initial phase is given in Figure \ref{['time-supp']}.
  • Figure 3: Level sets of distance function between a family of source distributions and a fixed target distribution with the task of finding the source distribution closest to the target distribution using (a) MMD, (b) $\bar{W}_2$, (c) $\epsilon$KL-UOT, and (d) MMD-UOT. While all methods correctly identify global minima (green arrows), level sets with MMD-UOT and $\epsilon$KL-UOT show no local minima (encircled in red for MMD) and have a lesser number of non-optimal stationary points (marked with black arrows) compared to (b).
  • Figure 4: Computation time: Convergence plots with $m=5000$ for the case of the same source and target measures where the optimal objective is expected to be 0. Left: MMD-UOT Problem (\ref{['eqn:kernot']}) solved with accelerated projected gradient descent. Right: $\epsilon$KL-UOT's convergence plot is shown separately. We observe that $\epsilon$KL-UOT's objective plateaus in 0.3 seconds. We note that our convergence to the optimal objective is faster than that of $\epsilon$KL-UOT.
  • Figure 5: Sample efficiency: Log-log plot of optimal objective vs number of samples. The optimal objective values of MMD-UOT and $\epsilon$KL-UOT formulation are shown as the number of samples increases. The data lies in 10 dimensions, and the source and target measures are both Uniform. MMD-UOT can be seen to have a better rate of convergence.
  • ...and 5 more figures

Theorems & Definitions (33)

  • Theorem 4.1
  • Corollary 4.2
  • Corollary 4.3
  • Corollary 4.4
  • Corollary 4.5
  • Corollary 4.6
  • Remark 4.7
  • Theorem 4.8
  • Remark 4.9
  • Theorem 4.10
  • ...and 23 more