Table of Contents
Fetching ...

Latent Communication in Artificial Neural Networks

Luca Moschella

TL;DR

The work introduces Latent Communication, a formal framework for unifying latent representations across independently trained neural networks through semantic correspondence between data spaces. It develops two core strategies: Relative Representations, which embed latent spaces into a universal, angle-invariant space via anchors, and Direct Translation, which learns affine/linear mappings to translate latent spaces directly, enabling zero-shot stitching across architectures and modalities. The thesis provides theoretical framing, extensive empirical validation across images, text, graphs, and audio, and demonstrates practical benefits including zero-shot model stitching, cross-domain retrieval, and cross-modality tasks without retraining. It further proposes extensions to overcome limitations (e.g., product of invariances, bootstrapping anchors) and presents real-world case studies (ASIF, chart merging, RL stitching). The work has influenced subsequent research and workshop activity, highlighting the potential to enable modular, data-efficient, and scalable reuse of neural components across domains.

Abstract

As NNs permeate various scientific and industrial domains, understanding the universality and reusability of their representations becomes crucial. At their core, these networks create intermediate neural representations, indicated as latent spaces, of the input data and subsequently leverage them to perform specific downstream tasks. This dissertation focuses on the universality and reusability of neural representations. Do the latent representations crafted by a NN remain exclusive to a particular trained instance, or can they generalize across models, adapting to factors such as randomness during training, model architecture, or even data domain? This adaptive quality introduces the notion of Latent Communication -- a phenomenon that describes when representations can be unified or reused across neural spaces. A salient observation from our research is the emergence of similarities in latent representations, even when these originate from distinct or seemingly unrelated NNs. By exploiting a partial correspondence between the two data distributions that establishes a semantic link, we found that these representations can either be projected into a universal representation, coined as Relative Representation, or be directly translated from one space to another. Latent Communication allows for a bridge between independently trained NN, irrespective of their training regimen, architecture, or the data modality they were trained on -- as long as the data semantic content stays the same (e.g., images and their captions). This holds true for both generation, classification and retrieval downstream tasks; in supervised, weakly supervised, and unsupervised settings; and spans various data modalities including images, text, audio, and graphs -- showcasing the universality of the Latent Communication phenomenon. [...]

Latent Communication in Artificial Neural Networks

TL;DR

The work introduces Latent Communication, a formal framework for unifying latent representations across independently trained neural networks through semantic correspondence between data spaces. It develops two core strategies: Relative Representations, which embed latent spaces into a universal, angle-invariant space via anchors, and Direct Translation, which learns affine/linear mappings to translate latent spaces directly, enabling zero-shot stitching across architectures and modalities. The thesis provides theoretical framing, extensive empirical validation across images, text, graphs, and audio, and demonstrates practical benefits including zero-shot model stitching, cross-domain retrieval, and cross-modality tasks without retraining. It further proposes extensions to overcome limitations (e.g., product of invariances, bootstrapping anchors) and presents real-world case studies (ASIF, chart merging, RL stitching). The work has influenced subsequent research and workshop activity, highlighting the potential to enable modular, data-efficient, and scalable reuse of neural components across domains.

Abstract

As NNs permeate various scientific and industrial domains, understanding the universality and reusability of their representations becomes crucial. At their core, these networks create intermediate neural representations, indicated as latent spaces, of the input data and subsequently leverage them to perform specific downstream tasks. This dissertation focuses on the universality and reusability of neural representations. Do the latent representations crafted by a NN remain exclusive to a particular trained instance, or can they generalize across models, adapting to factors such as randomness during training, model architecture, or even data domain? This adaptive quality introduces the notion of Latent Communication -- a phenomenon that describes when representations can be unified or reused across neural spaces. A salient observation from our research is the emergence of similarities in latent representations, even when these originate from distinct or seemingly unrelated NNs. By exploiting a partial correspondence between the two data distributions that establishes a semantic link, we found that these representations can either be projected into a universal representation, coined as Relative Representation, or be directly translated from one space to another. Latent Communication allows for a bridge between independently trained NN, irrespective of their training regimen, architecture, or the data modality they were trained on -- as long as the data semantic content stays the same (e.g., images and their captions). This holds true for both generation, classification and retrieval downstream tasks; in supervised, weakly supervised, and unsupervised settings; and spans various data modalities including images, text, audio, and graphs -- showcasing the universality of the Latent Communication phenomenon. [...]
Paper Structure (172 sections, 20 equations, 30 figures, 37 tables)

This paper contains 172 sections, 20 equations, 30 figures, 37 tables.

Figures (30)

  • Figure 1: Latent spaces learned by distinct trainings of the same AE on . The bottleneck has size $2$; thus, there is no dimensionality reduction in the latent space visualizations. The stochasticity in the training phase induces intrinsically similar representations.
  • Figure 2: The lcp. The unobservable manifolds and are embedded into the input spaces and through and . We can observe the semantic relationship between these manifolds, denoted as , through a partial correspondence defined between the input spaces. The encoding functions $_{}$ and $_{}$ map the input spaces to the respective latent spaces and , modifying the embedded manifolds and inducing a correlation between them through some transformation $\in$. The objective is to discover two specific transformations, $$ and $$, that allow the latent spaces and to be mapped into universal spaces $$ and $$. In the universal space , the latent manifolds embeddings must coincide: $(()) = (()) \subseteq$.
  • Figure 3: Latent spaces learned by distinct trainings of the same high-dimensional AE on the dataset. Each column is the latent space obtained by the AE with a different seed. On the first row, the dimensionality reduction is performed through PCAs fitted independently on each latent space, meanwhile, on the second row PCA is fitted on the leftmost latent space and then applied to all of them.
  • Figure 4: rr. (left): a sample $x$ and three anchor samples $a_1, a_2, a_3$ are embedded in a latent space and lie on the underlying embedded data manifold. (right): each dimension is treated as coefficients in a coordinate system defined by the anchors, the new representation of $x$ is given by its similarities with respect to the anchors. Anchors are orthogonal in this example only for visualization purposes.
  • Figure 5: Graph node classification task on . Left: Correlation between the performance of $\approx2000$ models and the similarity of their latent spaces with respect to a well-performing reference model. Right: The same correlation plotted over time. The mean Pearson correlation over all models is $0.955$, after filtering out the models having the best validation accuracy below $0.5$.
  • ...and 25 more figures

Theorems & Definitions (1)

  • Definition : Product projection