Table of Contents
Fetching ...

Latent Space Translation via Inverse Relative Projection

Valentino Maiorca, Luca Moschella, Marco Fumero, Francesco Locatello, Emanuele Rodolà

TL;DR

By formalizing the invertibility of angle-preserving relative representations and assuming the scale invariance of decoder modules in neural models, this work can effectively use the relative space as an intermediary, independently projecting onto and from other semantically similar spaces.

Abstract

The emergence of similar representations between independently trained neural models has sparked significant interest in the representation learning community, leading to the development of various methods to obtain communication between latent spaces. "Latent space communication" can be achieved in two ways: i) by independently mapping the original spaces to a shared or relative one; ii) by directly estimating a transformation from a source latent space to a target one. In this work, we combine the two into a novel method to obtain latent space translation through the relative space. By formalizing the invertibility of angle-preserving relative representations and assuming the scale invariance of decoder modules in neural models, we can effectively use the relative space as an intermediary, independently projecting onto and from other semantically similar spaces. Extensive experiments over various architectures and datasets validate our scale invariance assumption and demonstrate the high accuracy of our method in latent space translation. We also apply our method to zero-shot stitching between arbitrary pre-trained text and image encoders and their classifiers, even across modalities. Our method has significant potential for facilitating the reuse of models in a practical manner via compositionality.

Latent Space Translation via Inverse Relative Projection

TL;DR

By formalizing the invertibility of angle-preserving relative representations and assuming the scale invariance of decoder modules in neural models, this work can effectively use the relative space as an intermediary, independently projecting onto and from other semantically similar spaces.

Abstract

The emergence of similar representations between independently trained neural models has sparked significant interest in the representation learning community, leading to the development of various methods to obtain communication between latent spaces. "Latent space communication" can be achieved in two ways: i) by independently mapping the original spaces to a shared or relative one; ii) by directly estimating a transformation from a source latent space to a target one. In this work, we combine the two into a novel method to obtain latent space translation through the relative space. By formalizing the invertibility of angle-preserving relative representations and assuming the scale invariance of decoder modules in neural models, we can effectively use the relative space as an intermediary, independently projecting onto and from other semantically similar spaces. Extensive experiments over various architectures and datasets validate our scale invariance assumption and demonstrate the high accuracy of our method in latent space translation. We also apply our method to zero-shot stitching between arbitrary pre-trained text and image encoders and their classifiers, even across modalities. Our method has significant potential for facilitating the reuse of models in a practical manner via compositionality.
Paper Structure (29 sections, 6 equations, 7 figures, 2 tables)

This paper contains 29 sections, 6 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Zero-shot stitching of X and Y absolute spaces utilizing relative representation, direct latent translation, and our method (IRP). Relative representation requires $\text{dec}_{\mathbf{Z}}$ to stitch. Direct translation requires the estimation of $\mathcal{T}$ between $\mathbf{X}$ and $\mathbf{Y}$ directly, so both should be available at the same time. Instead, we first map to the bridge relative space $\mathbf{Z}$ and then, using $A_{\mathbf{Y}}$, we can independently map to $\mathbf{Y}$.
  • Figure 2: Scale invariance of RoBERTa according to the performance of a downstream classifier trained on the encodings of the last attention layer. At each layer (with 0 being the embedding layer and 12 the output one), one for each run, we rescale the encodings by the specified $\alpha$ and measure its effect on the final accuracy. The performance without any rescaling is $0.92$.
  • Figure 3: On the top row, reconstruction similarity sensitivity at different number of subspaces and pruning threshold ($\delta$) of intra-space inversion (left) and inter-space inversion between different encoders (right) on the coarse-grained Cifar100. On the bottom row, the corresponding condition number average over the subspaces. Higher pruning thresholds lower the condition number, stabilizing the matrix inverse, thus increasing the reconstruction similarity.
  • Figure 4: Performance comparison of three Multilayer Perceptrons (MLPs) with different activation functions, namely cosine (blue), ReLU (orange), and tanh (green) at different rescaling factors $\alpha$. The ReLU and tanh MLPs exhibit scale invariance, while the cosine activation function is only invariant on the mean data scale and its periodic cycles.
  • Figure 5: Distribution of the embedding scales in different pre-trained encoders on the Cifar100 dataset. Well-behaved Gaussian distributions with a single mode and well-defined mean are crucial for our method, as they support the ability to rescale the embeddings to a mean scale.
  • ...and 2 more figures