Mutual Regression Distance

Dong Qiao; Jicong Fan

Mutual Regression Distance

Dong Qiao, Jicong Fan

TL;DR

This work addresses the limitation of pairwise-distance metrics like OT and MMD in capturing manifold structure by introducing Mutual Regression Distance (MRD), a regression-based distance derived from a constrained mutual regression between two sample sets. MRD and its variants (tightened MRD, simplified MRD, Kernel MRD) are shown to be convex, permutation-invariant pseudometrics with robustness guarantees, and they offer computational advantages over Wasserstein distances. The paper demonstrates MRD’s practicality across distribution transformation, discrete distribution clustering, deep generative modeling (SMRDGAN), and domain adaptation, highlighting improved performance and efficiency. Overall, MRD provides a principled, manifold-aware alternative for measuring dissimilarity between distributions with broad applicability in learning, clustering, and transfer tasks.

Abstract

The maximum mean discrepancy and Wasserstein distance are popular distance measures between distributions and play important roles in many machine learning problems such as metric learning, generative modeling, domain adaption, and clustering. However, since they are functions of pair-wise distances between data points in two distributions, they do not exploit the potential manifold properties of data such as smoothness and hence are not effective in measuring the dissimilarity between the two distributions in the form of manifolds. In this paper, different from existing measures, we propose a novel distance called Mutual Regression Distance (MRD) induced by a constrained mutual regression problem, which can exploit the manifold property of data. We prove that MRD is a pseudometric that satisfies almost all the axioms of a metric. Since the optimization of the original MRD is costly, we provide a tight MRD and a simplified MRD, based on which a heuristic algorithm is established. We also provide kernel variants of MRDs that are more effective in handling nonlinear data. Our MRDs especially the simplified MRDs have much lower computational complexity than the Wasserstein distance. We provide theoretical guarantees, such as robustness, for MRDs. Finally, we apply MRDs to distribution clustering, generative models, and domain adaptation. The numerical results demonstrate the effectiveness and superiority of MRDs compared to the baselines.

Mutual Regression Distance

TL;DR

Abstract

Paper Structure (35 sections, 12 theorems, 84 equations, 4 figures, 5 tables, 3 algorithms)

This paper contains 35 sections, 12 theorems, 84 equations, 4 figures, 5 tables, 3 algorithms.

Introduction
Preliminaries and Related Work
Wasserstein distance
Sinkhorn distance
Maximum mean discrepancy
Mutual Regression Distance
Mutual regression problem
Optimization
Kernel MRD
Theoretical Analysis
Applications of MRD
Distribution transformation
Discrete distribution spectral clustering using MRD
SMRDGAN
Domain adaptation
...and 20 more sections

Key Result

Lemma 3.2

Assume $\bm{S}_1,\bm{S}_2 \in \mathcal{S}_{2}^{\le 1}$, and $\lambda \in [0, 1]$, let $\bm{S} = \lambda \bm{S}_1 + (1 - \lambda)\bm{S}_2$, then $\|\bm{S}\|_2 \in \mathcal{S}_{2}^{\le 1}$.

Figures (4)

Figure 1: Comparison of constrained optimization solved by CVX and Algorithm 1
Figure 2: Distribution transformation on a toy example
Figure 3: Samples from WGAN-GP, SMMDGAN, and our SMRDGAN. Top: $32\times 32$ CIFAR-10; bottom: $64\times 64$ CelebA.
Figure 4: Samples from WGAN-GP, SMMDGAN, and our SMRDGAN. Top: $32\times 32$ MNIST; bottom: $32\times 32$ Fashion-MNIST.

Theorems & Definitions (33)

Definition 3.1: Mutual regression problem
Lemma 3.2: Feasible set is convex
Lemma 3.3
Definition 3.4: MRD
Theorem 3.5: MRD is a pseudometric
Example 3.6
Definition 3.7: Tightened MRD
Definition 3.8: Simplified MRD
Lemma 3.9: Upper bound on regularization coefficient
Definition 3.10: Kernel MRD
...and 23 more

Mutual Regression Distance

TL;DR

Abstract

Mutual Regression Distance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (33)