Table of Contents
Fetching ...

From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication

Irene Cannistraci, Luca Moschella, Marco Fumero, Valentino Maiorca, Emanuele Rodolà

TL;DR

This work introduces a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations without requiring prior knowledge about the optimal invariance to infuse.

Abstract

It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases. From a geometric perspective, identifying the classes of transformations and the related invariances that connect these representations is fundamental to unlocking applications, such as merging, stitching, and reusing different neural modules. However, estimating task-specific transformations a priori can be challenging and expensive due to several factors (e.g., weights initialization, training hyperparameters, or data modality). To this end, we introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations without requiring prior knowledge about the optimal invariance to infuse. We validate our solution on classification and reconstruction tasks, observing consistent latent similarity and downstream performance improvements in a zero-shot stitching setting. The experimental analysis comprises three modalities (vision, text, and graphs), twelve pretrained foundational models, nine benchmarks, and several architectures trained from scratch.

From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication

TL;DR

This work introduces a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations without requiring prior knowledge about the optimal invariance to infuse.

Abstract

It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases. From a geometric perspective, identifying the classes of transformations and the related invariances that connect these representations is fundamental to unlocking applications, such as merging, stitching, and reusing different neural modules. However, estimating task-specific transformations a priori can be challenging and expensive due to several factors (e.g., weights initialization, training hyperparameters, or data modality). To this end, we introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations without requiring prior knowledge about the optimal invariance to infuse. We validate our solution on classification and reconstruction tasks, observing consistent latent similarity and downstream performance improvements in a zero-shot stitching setting. The experimental analysis comprises three modalities (vision, text, and graphs), twelve pretrained foundational models, nine benchmarks, and several architectures trained from scratch.
Paper Structure (22 sections, 8 equations, 14 figures, 31 tables)

This paper contains 22 sections, 8 equations, 14 figures, 31 tables.

Figures (14)

  • Figure 1: CKA similarity between pretrained models on F-MNIST measured infusing invariances to specific classes of transformations (Conformal, Euclidean, Orthogonal). Each bar reports the distribution of similarity to the other models. The score diversity highlights the absence of a universal transformation connecting all latent spaces.
  • Figure 2: Framework description. Given two latent spaces $\mathcal{Z},\mathcal{Z}'$ related by an unknown transformation $T$ (resp. $T^{-1}$), we assume that there exist a manifold $\mathcal{M}$ where samples in $\mathcal{Z},\mathcal{Z}'$ coincides when projected into $\mathcal{M}$, via $\pi_\mathcal{M}$. We approximate $\mathcal{M}$ building a product space $\tilde{\mathcal{M}}$, where each space is a computed using a similarity function $d_i$invariant to a specific, known class of transformations. Combining the resulting spaces, we recover a representation $r$ which should approximate $\pi_\mathcal{M}$.
  • Figure 3: Latent Spaces Cross-Seed Similarity. Cross-seed pearson correlation of latent spaces for trained on MNIST (left) and CIFAR-100 ((right)) until convergence. Notably, no single projection consistently outperforms others across all settings. The projection is not displayed to improve visualization.
  • Figure 4: Latent Spaces Cross-Architecture Similarity. Linear similarity of latent spaces across several pretrained models on CIFAR-10 (left) and TREC (right). In each bar, we report the space similarities distribution to the other models while infusing a specific invariance. There is no singular projection that consistently outperforms others across all configurations.
  • Figure 5: Reconstruction Qualitative Results on CIFAR-100 using . The last two sets of columns ($D_1 \circ E_1$ and $D_2 \circ E_2$) are the end-to-end with unique initialization seeds, while the first two ($D_1 \circ E_2$ and $D_2 \circ E_1$) illustrate the outputs of the zero-shot stitching of these independently trained models. The first row displays the source images, the two subsequent rows show the results from distinct combinations of projection aggregated through the Aggregation by sum, while the last one shows the outputs from a baseline model that does not incorporate our methodology. It is interesting to see that when using the projection (second to last row), the reconstructed images are blurred, but when removing the space from the aggregation (third row) the reconstruction drops in performance, meaning that it captures information not captured by the others.
  • ...and 9 more figures

Theorems & Definitions (1)

  • Definition : Product projection