Table of Contents
Fetching ...

Multi-Way Representation Alignment

Akshit Achara, Tatiana Gaintseva, Mateo Mahaut, Pritish Chakraborty, Viktor Stenby Johansson, Melih Barsbey, Emanuele Rodolà, Donato Crisostomi

TL;DR

The paper addresses the challenge of aligning representations from multiple independently trained models by introducing a shared universe framework. It adapts Generalized Procrustes Analysis (GPA) to build an isometric universal space, but shows that pure geometry preservation can impair retrieval performance, motivating a consensus-based correction. The authors propose Geometry-Corrected Procrustes Alignment (GCPA), which uses GPA as a geometric scaffold and applies a shared residual correction to bridge geometry and cross-model agreement. Across multilingual, cross-camera, and multimodal benchmarks, GCPA achieves state-of-the-art any-to-any retrieval while enabling scalable universe extension and robust aggregation, demonstrating practical interoperability for model stitching, cross-modal transfer, and zero-shot composition.

Abstract

The Platonic Representation Hypothesis suggests that independently trained neural networks converge to increasingly similar latent spaces. However, current strategies for mapping these representations are inherently pairwise, scaling quadratically with the number of models and failing to yield a consistent global reference. In this paper, we study the alignment of $M \ge 3$ models. We first adapt Generalized Procrustes Analysis (GPA) to construct a shared orthogonal universe that preserves the internal geometry essential for tasks like model stitching. We then show that strict isometric alignment is suboptimal for retrieval, where agreement-maximizing methods like Canonical Correlation Analysis (CCA) typically prevail. To bridge this gap, we finally propose Geometry-Corrected Procrustes Alignment (GCPA), which establishes a robust GPA-based universe followed by a post-hoc correction for directional mismatch. Extensive experiments demonstrate that GCPA consistently improves any-to-any retrieval while retaining a practical shared reference space.

Multi-Way Representation Alignment

TL;DR

The paper addresses the challenge of aligning representations from multiple independently trained models by introducing a shared universe framework. It adapts Generalized Procrustes Analysis (GPA) to build an isometric universal space, but shows that pure geometry preservation can impair retrieval performance, motivating a consensus-based correction. The authors propose Geometry-Corrected Procrustes Alignment (GCPA), which uses GPA as a geometric scaffold and applies a shared residual correction to bridge geometry and cross-model agreement. Across multilingual, cross-camera, and multimodal benchmarks, GCPA achieves state-of-the-art any-to-any retrieval while enabling scalable universe extension and robust aggregation, demonstrating practical interoperability for model stitching, cross-modal transfer, and zero-shot composition.

Abstract

The Platonic Representation Hypothesis suggests that independently trained neural networks converge to increasingly similar latent spaces. However, current strategies for mapping these representations are inherently pairwise, scaling quadratically with the number of models and failing to yield a consistent global reference. In this paper, we study the alignment of models. We first adapt Generalized Procrustes Analysis (GPA) to construct a shared orthogonal universe that preserves the internal geometry essential for tasks like model stitching. We then show that strict isometric alignment is suboptimal for retrieval, where agreement-maximizing methods like Canonical Correlation Analysis (CCA) typically prevail. To bridge this gap, we finally propose Geometry-Corrected Procrustes Alignment (GCPA), which establishes a robust GPA-based universe followed by a post-hoc correction for directional mismatch. Extensive experiments demonstrate that GCPA consistently improves any-to-any retrieval while retaining a practical shared reference space.
Paper Structure (45 sections, 6 theorems, 36 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 45 sections, 6 theorems, 36 equations, 11 figures, 5 tables, 1 algorithm.

Key Result

Proposition 3.1

The global minimizer of Eq. eq:gcca_objective is obtained via a spectral decomposition of a matrix constructed from cross-view correlations (Theorem thm:gcca_solution). Among linear shared-basis embeddings satisfying the constraint, this solution globally minimizes the total pairwise mismatch energy

Figures (11)

  • Figure 1: Pairwise alignment (left) learns a separate map for each ordered pair, which does not enforce consistency when maps are composed. Universe alignment (right) learns one map per model into a shared reference $U$, enabling translation between models by composition.
  • Figure 2: Multi-way alignment stabilizes fragile connections. On edge-heavy CIFAR-100, we isolate a "weak" model pair with poor alignment. By progressively expanding the universe with robust models and refitting the universe, we observe a monotonic increase in stitching accuracy between the original fragile pair.
  • Figure 3: Cross-model probing on CIFAR-100. Adding a new model by fitting only $\Omega_{M+1}$ into a fixed universe (GPA-ADD) approaches refitting the universe (GPA-REFIT) and outperforms PW alignment. To cover diverse scenarios, we use four different base model sets where the first two (from the left) sets consist of three models and the next two consist of five models each.
  • Figure 4: Cross-lingual retrieval on TED-Multi (rank-1). GCPA outperforms GCCA, GPA, and pairwise orthogonal alignment.
  • Figure 5: Robustness to correspondence noise on TED-Multi. Rank-1 retrieval accuracy (%) on the clean test split relative to the unshuffled baseline. Solid bars average over the six directed pairs within the triad; hatched bars average over all directed pairs that involve at least one shuffled language. Results are averaged over three disjoint triads.
  • ...and 6 more figures

Theorems & Definitions (10)

  • Proposition 3.1: GCCA minimizes squared discrepancy
  • Proposition 3.2
  • Theorem B.1: Optimal multi-space alignment under squared cross-space discrepancy
  • proof
  • Corollary B.2
  • proof
  • Corollary B.3
  • proof
  • Corollary B.4
  • proof