Table of Contents
Fetching ...

Metric Space Magnitude for Evaluating the Diversity of Latent Representations

Katharina Limbeck, Rayna Andreeva, Rik Sarkar, Bastian Rieck

TL;DR

This work addresses the challenge of evaluating intrinsic diversity of latent representations without relying on ground-truth distributions. It introduces metric-space magnitude and its multi-scale variants MagArea and MagDiff to produce robust, scale-aware diversity summaries across text, image, and graph embeddings. The authors establish axiomatic advantages, provide efficient computational strategies (notably via Cholesky factorization), and validate the approach with extensive experiments showing superior performance over traditional diversity metrics and reliable mode-collapse/dropping detection. The results suggest magnitude-based diversity offers a principled, scalable tool for model evaluation and comparison in representation learning, with practical implications for debugging and regularization in generative and embedding-based systems.

Abstract

The magnitude of a metric space is a novel invariant that provides a measure of the 'effective size' of a space across multiple scales, while also capturing numerous geometrical properties, such as curvature, density, or entropy. We develop a family of magnitude-based measures of the intrinsic diversity of latent representations, formalising a novel notion of dissimilarity between magnitude functions of finite metric spaces. Our measures are provably stable under perturbations of the data, can be efficiently calculated, and enable a rigorous multi-scale characterisation and comparison of latent representations. We show their utility and superior performance across different domains and tasks, including (i) the automated estimation of diversity, (ii) the detection of mode collapse, and (iii) the evaluation of generative models for text, image, and graph data.

Metric Space Magnitude for Evaluating the Diversity of Latent Representations

TL;DR

This work addresses the challenge of evaluating intrinsic diversity of latent representations without relying on ground-truth distributions. It introduces metric-space magnitude and its multi-scale variants MagArea and MagDiff to produce robust, scale-aware diversity summaries across text, image, and graph embeddings. The authors establish axiomatic advantages, provide efficient computational strategies (notably via Cholesky factorization), and validate the approach with extensive experiments showing superior performance over traditional diversity metrics and reliable mode-collapse/dropping detection. The results suggest magnitude-based diversity offers a principled, scalable tool for model evaluation and comparison in representation learning, with practical implications for debugging and regularization in generative and embedding-based systems.

Abstract

The magnitude of a metric space is a novel invariant that provides a measure of the 'effective size' of a space across multiple scales, while also capturing numerous geometrical properties, such as curvature, density, or entropy. We develop a family of magnitude-based measures of the intrinsic diversity of latent representations, formalising a novel notion of dissimilarity between magnitude functions of finite metric spaces. Our measures are provably stable under perturbations of the data, can be efficiently calculated, and enable a rigorous multi-scale characterisation and comparison of latent representations. We show their utility and superior performance across different domains and tasks, including (i) the automated estimation of diversity, (ii) the detection of mode collapse, and (iii) the evaluation of generative models for text, image, and graph data.
Paper Structure (45 sections, 6 theorems, 10 equations, 23 figures, 7 tables)

This paper contains 45 sections, 6 theorems, 10 equations, 23 figures, 7 tables.

Key Result

Lemma A.1

Let $\|A\|_2 := \sup{\{\|Ax\|_2 : x \in \mathds{R}^n \text{ with } \|x\|_2 = 1\}}$ refer to the induced $2$-norm for matrices, and let $A, B$ be two $n \times n$ matrices with $\|A - B\|_2 \leq \epsilon$. Moreover, let $f(M) := \mathds{1}^\top M \mathds{1}$. Then $\|f(A) - f(B)\|_2 \leq n\epsilon$.

Figures (23)

  • Figure 1: Overview of our diversity evaluation pipeline. (a) We start with an example of four latent spaces with $200$ points, varying in diversity. (b) The magnitude function measures the effective number of points at $t$, a scale of distance between observations. When the scale factor $t$ almost equals zero, magnitude is close to 1, and a space effectively looks like one point. For large $t$, the number of effective points is noticeably higher and magnitude converges towards the cardinality. We find the approximate convergence scale, $t_{\text{conv}}$, at which magnitude almost equals the cardinality, and use it to define the evaluation interval $T$ across which diversity changes most notably. (c) The more diverse the space, the higher the value of its magnitude function. By construction, $X_1$ is more diverse than $X_2$, $X_3$, and $X_4$, respectively, as we can see from the effective size of each space. We leverage this behaviour to define novel multi-scale indicators of diversity. (d) Our proposed measure of intrinsic diversity, MagArea, summarises the area under each magnitude function for reference-free diversity evaluation. (e) In a reference-based setting, we assess the difference in diversity using MagDiff, the area between two magnitude functions.
  • Figure 2: Magnitude detects curvature. Left: Magnitude functions for unit disks with varying curvature between $[-2, 2]$. Right: MagArea exhibits a linear relationship with curvature, indicating that it serves as a expressive predictor.
  • Figure 3: MagArea outperforms alternative diversity measures at predicting the ground truth-diversity of generated sentences, controlled by the softmax-temperature across 3 tasks and 5 embedding models. Baseline measures, AvgSim and GMStds, perform worse in terms of the $R^2$ scores. Points show the mean of the $R^2$ scores, while lines represent the standard deviations across $5$-fold cross-validation (repeated $10$ times).
  • Figure 4: MagArea correlates well with $\mathrm{dec}$ indicating the true diversity. Here, we use mpnet embeddings for the resp dataset. $\rho$ denotes the rank correlation between MagArea and $\mathrm{dec}$ (95% bootstrap interval, $1000$ resamples).
  • Figure 5: Magnitude correctly detects that diversity decreases in the same manner across simultaneous and sequential mode dropping outperforming recall and coverage. Lines show the mean values of each metric across $20$ resamples, shaded areas the standard deviations.
  • ...and 18 more figures

Theorems & Definitions (16)

  • Definition 3.1: Magnitude of a metric space
  • Definition 3.2: Magnitude function
  • Definition 3.3: Area under the magnitude function, MagArea
  • Definition 3.4: Magnitude function difference, MagDiff
  • Definition 3.5: Convergence scale, $t_\text{conv}$
  • Definition A.1: Magnitude weights
  • Lemma A.1
  • proof
  • Lemma A.1
  • proof
  • ...and 6 more