Table of Contents
Fetching ...

Magnitude Distance: A Geometric Measure of Dataset Similarity

Sahel Torkamani, Henry Gouk, Rik Sarkar

TL;DR

This work introduces magnitude distance, a geometry-based metric between finite datasets built from the magnitude of metric spaces, featuring a tunable scale parameter $t$ that balances global versus local data structure. By leveraging a kernelized similarity matrix and its inverse, the approach maintains discriminability in high dimensions and offers principled robustness to outliers. The authors establish core properties, including symmetry, non-negativity, and a scale-dependent limiting behavior, and show how magnitude distance can serve as a training objective in push-forward generative models, exemplified by the Magnitude Generative Network (MagGN). Through theoretical analysis and empirical studies on MNIST, CIFAR-10, and CelebA, the method demonstrates meaningful dataset discrimination, training efficiency, and improved downstream performance, suggesting broad applicability in hypothesis testing, distribution shift robustness, and privacy-aware data analysis.

Abstract

Quantifying the distance between datasets is a fundamental question in mathematics and machine learning. We propose \textit{magnitude distance}, a novel distance metric defined on finite datasets using the notion of the \emph{magnitude} of a metric space. The proposed distance incorporates a tunable scaling parameter, $t$, that controls the sensitivity to global structure (small $t$) and finer details (large $t$). We prove several theoretical properties of magnitude distance, including its limiting behavior across scales and conditions under which it satisfies key metric properties. In contrast to classical distances, we show that magnitude distance remains discriminative in high-dimensional settings when the scale is appropriately tuned. We further demonstrate how magnitude distance can be used as a training objective for push-forward generative models. Our experimental results support our theoretical analysis and demonstrate that magnitude distance provides meaningful signals, comparable to established distance-based generative approaches.

Magnitude Distance: A Geometric Measure of Dataset Similarity

TL;DR

This work introduces magnitude distance, a geometry-based metric between finite datasets built from the magnitude of metric spaces, featuring a tunable scale parameter that balances global versus local data structure. By leveraging a kernelized similarity matrix and its inverse, the approach maintains discriminability in high dimensions and offers principled robustness to outliers. The authors establish core properties, including symmetry, non-negativity, and a scale-dependent limiting behavior, and show how magnitude distance can serve as a training objective in push-forward generative models, exemplified by the Magnitude Generative Network (MagGN). Through theoretical analysis and empirical studies on MNIST, CIFAR-10, and CelebA, the method demonstrates meaningful dataset discrimination, training efficiency, and improved downstream performance, suggesting broad applicability in hypothesis testing, distribution shift robustness, and privacy-aware data analysis.

Abstract

Quantifying the distance between datasets is a fundamental question in mathematics and machine learning. We propose \textit{magnitude distance}, a novel distance metric defined on finite datasets using the notion of the \emph{magnitude} of a metric space. The proposed distance incorporates a tunable scaling parameter, , that controls the sensitivity to global structure (small ) and finer details (large ). We prove several theoretical properties of magnitude distance, including its limiting behavior across scales and conditions under which it satisfies key metric properties. In contrast to classical distances, we show that magnitude distance remains discriminative in high-dimensional settings when the scale is appropriately tuned. We further demonstrate how magnitude distance can be used as a training objective for push-forward generative models. Our experimental results support our theoretical analysis and demonstrate that magnitude distance provides meaningful signals, comparable to established distance-based generative approaches.
Paper Structure (26 sections, 18 theorems, 38 equations, 9 figures, 3 tables)

This paper contains 26 sections, 18 theorems, 38 equations, 9 figures, 3 tables.

Key Result

Theorem 4.1

Let $X$ be a finite collection of points in Euclidean space $\mathbb{R}^D$, and define the similarity matrix as before. Then:

Figures (9)

  • Figure 1: Impact of the scaling parameter $t$, on the magnitude distance between samplings of $N(0,1)$ and $N(\mu ,1)$ in $100$ dimensions. Each plot shows magnitude distance computed from 200 samples of $N(0,1)$ to 200 samples of $N(\mu ,1)$ with $t = 0.1$ (blue), $0.2$ (purple), $0.3$ (yellow), $0.4$ (red), $0.5$ (green), $0.6$ (dark purple). Plot \ref{['plot:normalized_mag']} and \ref{['plot:standard_mag']} show the normalized and standard magnitude distances, respectively. As the mean difference between samplings increases, both magnitude distances also increase. However, the standard magnitude distance converges toward different limits depending on $t$, while the normalized magnitude distance consistently converges to $1$, regardless of $t$ By Theorem \ref{['theo:magdist_over_t']}, these limits are also bounded above by the cardinality of the symmetric difference of the samples.
  • Figure 2: From two Gaussian distributions with identical covariance and a mean shift of $2$, we compute the empirical MMD and magnitude distance between $500$ samples from each distribution over $100$ independent trials. The plot shows a comparison between the mean of empirical MMD distance with Gaussian kernel bandwidth $\sigma=1$ and $\sigma=1/\sqrt{D}$, and normalized magnitude distance for fixed kernel scales $t \in \{0.01, 0.1\}$ and adaptive scales $t = 1/D$ and $t = 1/\sqrt{D}$. MMD in both settings rapidly collapses toward zero as the dimension increases. Wasserstein distance decreases with dimension but remains comparatively stable. Magnitude distance with $(t=1/D)$ exhibits mis-scaling in low dimensions, producing overly small distances. For fixed scaling or scaling $t = 1/\sqrt{D}$, magnitude distance remains stable across dimensions, with only gradual changes.
  • Figure 3: From two Gaussian distributions with identical covariance and a mean shift of $2$, we compute empirical distances between $500$ samples from each distribution over $100$ independent trials. The plot shows the coefficient of variation (log-scale) of Wasserstein distance, MMD with Gaussian kernel bandwidths $\sigma = 1$ and $\sigma = 1/\sqrt{D}$, and normalized magnitude distance with fixed kernel scales $t \in \{0.01, 0.1\}$ and adaptive scales $t = 1/D$ and $t = 1/\sqrt{D}$. MMD shows unstable behavior across kernel choices, and Wasserstein distance has consistently higher relative variability than magnitude distance. Moreover, magnitude-distance adaptive scaling maintains lower relative variability across dimensions while avoiding collapse to $0$. For fixed kernel scales, magnitude distance can initially show behavior similar to adaptive scaling; however, the outcome depends on the chosen scale. Larger fixed scale $t = 0.1$ enters an over-localized regime at sufficiently high dimensions, where off-diagonal kernel similarities vanish, and the distance collapses abruptly. Smaller fixed scale $t = 0.01$ preserves stability over a wider range of dimensions, although they are also expected to eventually fail as dimensionality continues to increase.
  • Figure 4: Outlier robustness in 2D with the baseline dataset $B \sim \mathcal{N}([0, 0], 1)$ (blue points), and set $Y \sim \mathcal{N}([2, 2], 1)$ (yellow points), with noisy variant $Y^{*}$, incorporating the outliers, $Y^\prime$ (red points).
  • Figure 5: Outlier robustness under Huber contamination. We compare empirical Wasserstein distance and magnitude distance with $t=0.001$ between $500$ samples drawn from a contaminated distribution $Q = (1-\epsilon)P + \epsilon R$ and $500$ from the clean distribution $P$. The plot shows three contamination levels, $\epsilon \in \{ 0.01, 0.05, 0.1 \}$. As the outlier radius increases, the Wasserstein distance grows, while magnitude distance remains close to $0$.
  • ...and 4 more figures

Theorems & Definitions (32)

  • Definition 3.1: Metric Magnitude
  • Definition 3.2: Magnitude function
  • Theorem 4.1
  • Definition 4.2: Magnitude Distance
  • Definition 5.1: Magnitude Equivalence
  • Theorem 5.2
  • Theorem 5.3
  • Proposition 5.4
  • Theorem 5.5
  • Proposition : 2.4.3, leinster2013magnitude
  • ...and 22 more