Mutual Regression Distance
Dong Qiao, Jicong Fan
TL;DR
This work addresses the limitation of pairwise-distance metrics like OT and MMD in capturing manifold structure by introducing Mutual Regression Distance (MRD), a regression-based distance derived from a constrained mutual regression between two sample sets. MRD and its variants (tightened MRD, simplified MRD, Kernel MRD) are shown to be convex, permutation-invariant pseudometrics with robustness guarantees, and they offer computational advantages over Wasserstein distances. The paper demonstrates MRD’s practicality across distribution transformation, discrete distribution clustering, deep generative modeling (SMRDGAN), and domain adaptation, highlighting improved performance and efficiency. Overall, MRD provides a principled, manifold-aware alternative for measuring dissimilarity between distributions with broad applicability in learning, clustering, and transfer tasks.
Abstract
The maximum mean discrepancy and Wasserstein distance are popular distance measures between distributions and play important roles in many machine learning problems such as metric learning, generative modeling, domain adaption, and clustering. However, since they are functions of pair-wise distances between data points in two distributions, they do not exploit the potential manifold properties of data such as smoothness and hence are not effective in measuring the dissimilarity between the two distributions in the form of manifolds. In this paper, different from existing measures, we propose a novel distance called Mutual Regression Distance (MRD) induced by a constrained mutual regression problem, which can exploit the manifold property of data. We prove that MRD is a pseudometric that satisfies almost all the axioms of a metric. Since the optimization of the original MRD is costly, we provide a tight MRD and a simplified MRD, based on which a heuristic algorithm is established. We also provide kernel variants of MRDs that are more effective in handling nonlinear data. Our MRDs especially the simplified MRDs have much lower computational complexity than the Wasserstein distance. We provide theoretical guarantees, such as robustness, for MRDs. Finally, we apply MRDs to distribution clustering, generative models, and domain adaptation. The numerical results demonstrate the effectiveness and superiority of MRDs compared to the baselines.
