Table of Contents
Fetching ...

Measuring similarity between embedding spaces using induced neighborhood graphs

Tiago F. Tavares, Fabio Ayres, Paris Smaragdis

TL;DR

This work proposes a metric to evaluate the similarity between paired item representations, built from the structural similarity between the nearest-neighbors induced graphs of each representation, and can be configured to compare spaces based on different distance metrics and on different neighborhood sizes.

Abstract

Deep Learning techniques have excelled at generating embedding spaces that capture semantic similarities between items. Often these representations are paired, enabling experiments with analogies (pairs within the same domain) and cross-modality (pairs across domains). These experiments are based on specific assumptions about the geometry of embedding spaces, which allow finding paired items by extrapolating the positional relationships between embedding pairs in the training dataset, allowing for tasks such as finding new analogies, and multimodal zero-shot classification. In this work, we propose a metric to evaluate the similarity between paired item representations. Our proposal is built from the structural similarity between the nearest-neighbors induced graphs of each representation, and can be configured to compare spaces based on different distance metrics and on different neighborhood sizes. We demonstrate that our proposal can be used to identify similar structures at different scales, which is hard to achieve with kernel methods such as Centered Kernel Alignment (CKA). We further illustrate our method with two case studies: an analogy task using GloVe embeddings, and zero-shot classification in the CIFAR-100 dataset using CLIP embeddings. Our results show that accuracy in both analogy and zero-shot classification tasks correlates with the embedding similarity. These findings can help explain performance differences in these tasks, and may lead to improved design of paired-embedding models in the future.

Measuring similarity between embedding spaces using induced neighborhood graphs

TL;DR

This work proposes a metric to evaluate the similarity between paired item representations, built from the structural similarity between the nearest-neighbors induced graphs of each representation, and can be configured to compare spaces based on different distance metrics and on different neighborhood sizes.

Abstract

Deep Learning techniques have excelled at generating embedding spaces that capture semantic similarities between items. Often these representations are paired, enabling experiments with analogies (pairs within the same domain) and cross-modality (pairs across domains). These experiments are based on specific assumptions about the geometry of embedding spaces, which allow finding paired items by extrapolating the positional relationships between embedding pairs in the training dataset, allowing for tasks such as finding new analogies, and multimodal zero-shot classification. In this work, we propose a metric to evaluate the similarity between paired item representations. Our proposal is built from the structural similarity between the nearest-neighbors induced graphs of each representation, and can be configured to compare spaces based on different distance metrics and on different neighborhood sizes. We demonstrate that our proposal can be used to identify similar structures at different scales, which is hard to achieve with kernel methods such as Centered Kernel Alignment (CKA). We further illustrate our method with two case studies: an analogy task using GloVe embeddings, and zero-shot classification in the CIFAR-100 dataset using CLIP embeddings. Our results show that accuracy in both analogy and zero-shot classification tasks correlates with the embedding similarity. These findings can help explain performance differences in these tasks, and may lead to improved design of paired-embedding models in the future.

Paper Structure

This paper contains 25 sections, 16 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Mean structural similarity in the presence of white noise. The point clouds were generated using $n=100$ points and $d=50$ dimensions. The impact of changing $n$ and $d$ are respectively discussed in Section \ref{['sec:change_n']} and Section \ref{['sec:change_d']}. The curves were bootstrapped as to obtain mean and standard deviation, and the depicted intervals correspond to two standard deviations. With very low SNRs, $\text{NNGS}(X,Y,k) \rightarrow H(k)$, while in high SNRs $\text{NNGS}(X,Y,k) \rightarrow 1$. An SNR sweep is provided in Appendix \ref{['sec:white_noise_sweep']}.
  • Figure 2: The similarity $\text{NNGS}(X,Y,k)$ remains nearly constant as the point cloud size $n$ increases as long as $k = \lfloor c(n-1) \rfloor$ for a constant value of $c$. In this figure, we arbitrarily chose $c=0.2$.
  • Figure 3: The similarity $\text{NNGS}(X,Y,k)$ is not affected by the increase of dimensionality, except in very low dimensionality settings (less than 10). In this figure, we fixed $k=20$ and the point cloud size $n=100$. This is a consequence of the fact that random points in higher dimensionalities are likely to be orthogonal to each other, thus the noise variance required to get closer to different neighbors is similar regardless of the increased dimensionality.
  • Figure 4: Mean structural similarity in GloVe embeddings for each analogy task. The curves present a shape similar to those in Figure \ref{['fig:whitenoise']}, and some tasks clearly have greater similarity than others.
  • Figure 5: NNGS and analogy accuracy in GloVe embeddings for each task. There is a clear trend in which higher structural similarity is associated with higher analogy accuracy (Pearson's $\rho=0.86$).
  • ...and 2 more figures