Connecting Neural Models Latent Geometries with Relative Geodesic Representations
Hanlin Yu, Berfin Inal, Georgios Arvanitidis, Soren Hauberg, Francesco Locatello, Marco Fumero
TL;DR
The authors address how independently trained neural models on similar data instantiate the same latent manifold by leveraging a pullback metric to define relative geodesic representations. They introduce an efficient geodesic-energy proxy based on straight-line latent paths and explore two principled metric choices (classifier logits and Diet-based instance discrimination) to pull back intrinsic geometry. Across autoencoders and vision foundation models, they demonstrate improved retrieval and zero-shot stitching performance, and show strong identifiability in multimodal settings. This geometrically grounded approach enables scalable, task-relevant alignment of diverse models with minimal supervision. It offers a principled path toward robust cross-model communication and modular integration in large AI systems.
Abstract
Neural models learn representations of high-dimensional data on low-dimensional manifolds. Multiple factors, including stochasticities in the training process, model architectures, and additional inductive biases, may induce different representations, even when learning the same task on the same data. However, it has recently been shown that when a latent structure is shared between distinct latent spaces, relative distances between representations can be preserved, up to distortions. Building on this idea, we demonstrate that exploiting the differential-geometric structure of latent spaces of neural models, it is possible to capture precisely the transformations between representational spaces trained on similar data distributions. Specifically, we assume that distinct neural models parametrize approximately the same underlying manifold, and introduce a representation based on the pullback metric that captures the intrinsic structure of the latent space, while scaling efficiently to large models. We validate experimentally our method on model stitching and retrieval tasks, covering autoencoders and vision foundation discriminative models, across diverse architectures, datasets, and pretraining schemes.
