Table of Contents
Fetching ...

Connecting Neural Models Latent Geometries with Relative Geodesic Representations

Hanlin Yu, Berfin Inal, Georgios Arvanitidis, Soren Hauberg, Francesco Locatello, Marco Fumero

TL;DR

The authors address how independently trained neural models on similar data instantiate the same latent manifold by leveraging a pullback metric to define relative geodesic representations. They introduce an efficient geodesic-energy proxy based on straight-line latent paths and explore two principled metric choices (classifier logits and Diet-based instance discrimination) to pull back intrinsic geometry. Across autoencoders and vision foundation models, they demonstrate improved retrieval and zero-shot stitching performance, and show strong identifiability in multimodal settings. This geometrically grounded approach enables scalable, task-relevant alignment of diverse models with minimal supervision. It offers a principled path toward robust cross-model communication and modular integration in large AI systems.

Abstract

Neural models learn representations of high-dimensional data on low-dimensional manifolds. Multiple factors, including stochasticities in the training process, model architectures, and additional inductive biases, may induce different representations, even when learning the same task on the same data. However, it has recently been shown that when a latent structure is shared between distinct latent spaces, relative distances between representations can be preserved, up to distortions. Building on this idea, we demonstrate that exploiting the differential-geometric structure of latent spaces of neural models, it is possible to capture precisely the transformations between representational spaces trained on similar data distributions. Specifically, we assume that distinct neural models parametrize approximately the same underlying manifold, and introduce a representation based on the pullback metric that captures the intrinsic structure of the latent space, while scaling efficiently to large models. We validate experimentally our method on model stitching and retrieval tasks, covering autoencoders and vision foundation discriminative models, across diverse architectures, datasets, and pretraining schemes.

Connecting Neural Models Latent Geometries with Relative Geodesic Representations

TL;DR

The authors address how independently trained neural models on similar data instantiate the same latent manifold by leveraging a pullback metric to define relative geodesic representations. They introduce an efficient geodesic-energy proxy based on straight-line latent paths and explore two principled metric choices (classifier logits and Diet-based instance discrimination) to pull back intrinsic geometry. Across autoencoders and vision foundation models, they demonstrate improved retrieval and zero-shot stitching performance, and show strong identifiability in multimodal settings. This geometrically grounded approach enables scalable, task-relevant alignment of diverse models with minimal supervision. It offers a principled path toward robust cross-model communication and modular integration in large AI systems.

Abstract

Neural models learn representations of high-dimensional data on low-dimensional manifolds. Multiple factors, including stochasticities in the training process, model architectures, and additional inductive biases, may induce different representations, even when learning the same task on the same data. However, it has recently been shown that when a latent structure is shared between distinct latent spaces, relative distances between representations can be preserved, up to distortions. Building on this idea, we demonstrate that exploiting the differential-geometric structure of latent spaces of neural models, it is possible to capture precisely the transformations between representational spaces trained on similar data distributions. Specifically, we assume that distinct neural models parametrize approximately the same underlying manifold, and introduce a representation based on the pullback metric that captures the intrinsic structure of the latent space, while scaling efficiently to large models. We validate experimentally our method on model stitching and retrieval tasks, covering autoencoders and vision foundation discriminative models, across diverse architectures, datasets, and pretraining schemes.

Paper Structure

This paper contains 46 sections, 1 theorem, 14 equations, 27 figures, 22 tables, 1 algorithm.

Key Result

Proposition 3.1

Let $\bm{\gamma}:[0,1]\to \mathcal{M}$ be a smooth curve on a Riemannian manifold $(\mathcal{M},G)$, and let $(\mathcal{M}',G')$ be a reparameterization of the manifold and $\varphi:[0,1]\to[0,1]$ a smooth, strictly increasing reparametrization of $\gamma$. Setting $\bm{\gamma}'(\tau)=\bm{\gamma}\bi Furthermore, the Riemannian arc length of $\bm{\gamma}$ is invariant under reparametrizations $\bm{

Figures (27)

  • Figure 1: Neural models trained on similar data learn parametrizations of the same manifold. NNs learn parametrizations ($D_1, D_2$) of the same underlying manifold $\mathcal{Y}$ up to isometries $T$. Pulling back the metric from $\mathcal{Y}$ makes relative geodesic representations invariant to transformations $T$ between latent spaces $\mathcal{Z}_1$ and $\mathcal{Z}_2$.
  • Figure 2: Pairwise latent‐space energy matrices for (a) MNIST and (b) CIFAR-10. In each subfigure, the left heatmap shows the straight-line energy approximation and the right shows the geodesic energies of the ground truth geodesic curve. The Spearman rank correlations between the two measures are $\rho=0.99$ for MNIST and $\rho=1.00$ for CIFAR-10, demonstrating near-perfect agreements.
  • Figure 3: Aligning latent spaces of autoencoders: MRR score as a function of the number of anchors on pairs of autoencoders trained with different initializations on the MNIST (left), FashionMNIST (center), CIFAR10 (right) datasets, respectively. In green, we plot the performance of Moschella2023; in red and orange the linear and orthogonal baselines respectively; in blue, our method. The shaded area indicates standard deviation across 5 different random sets of anchors. Relative geodesic consistently outperforms baselines, obtaining peak performance.
  • Figure 4: Stitching on Autoencoders: We visualize qualitative reconstructions of samples, stitching autoencoders of models trained with different initializations on MNIST (left), FashionMNIST (center), CIFAR10 (right). The first two columns show reconstructions from the original models; middle three columns represent baselines maiorca2024latentMoschella2023; the rightmost column is our method. Relative geodesic yields the best stitching results using just 5 anchors.
  • Figure 5: CUB Accuracies (top) and symmetricized MRR cosine (bottom). RelGeo(Diet) and especially RelGeo($L2$) provide strong stitching accuracies, while RelGeo(Diet) maintains strong instance identifiability.
  • ...and 22 more figures

Theorems & Definitions (3)

  • Proposition 3.1
  • proof
  • proof