Table of Contents
Fetching ...

ContraSim -- Analyzing Neural Representations Based on Contrastive Learning

Adir Rahamim, Yonatan Belinkov

TL;DR

ContraSim introduces a trainable, contrastive-learning–based similarity measure for neural representations that uses positive and negative example sets to produce domain-adapted projections. It trains a lightweight encoder to map representations to a space where a simple dot-product (with $L_2$ normalization) yields the similarity, optimized via a contrastive objective with temperature $\tau$. Across language and vision tasks, ContraSim outperforms traditional measures (e.g., $\text{CKA}$, $\text{PWCCA}$) on layer-prediction, multilingual, and image-caption benchmarks, with robustness to FAISS-based hard negatives. The method also provides new interpretability insights, such as layer-wise divergence across modalities and cross-language alignment dynamics, and remains effective when transferred across domains or when representation dimensions vary.

Abstract

Recent work has compared neural network representations via similarity-based analyses to improve model interpretation. The quality of a similarity measure is typically evaluated by its success in assigning a high score to representations that are expected to be matched. However, existing similarity measures perform mediocrely on standard benchmarks. In this work, we develop a new similarity measure, dubbed ContraSim, based on contrastive learning. In contrast to common closed-form similarity measures, ContraSim learns a parameterized measure by using both similar and dissimilar examples. We perform an extensive experimental evaluation of our method, with both language and vision models, on the standard layer prediction benchmark and two new benchmarks that we introduce: the multilingual benchmark and the image-caption benchmark. In all cases, ContraSim achieves much higher accuracy than previous similarity measures, even when presented with challenging examples. Finally, ContraSim is more suitable for the analysis of neural networks, revealing new insights not captured by previous measures.

ContraSim -- Analyzing Neural Representations Based on Contrastive Learning

TL;DR

ContraSim introduces a trainable, contrastive-learning–based similarity measure for neural representations that uses positive and negative example sets to produce domain-adapted projections. It trains a lightweight encoder to map representations to a space where a simple dot-product (with normalization) yields the similarity, optimized via a contrastive objective with temperature . Across language and vision tasks, ContraSim outperforms traditional measures (e.g., , ) on layer-prediction, multilingual, and image-caption benchmarks, with robustness to FAISS-based hard negatives. The method also provides new interpretability insights, such as layer-wise divergence across modalities and cross-language alignment dynamics, and remains effective when transferred across domains or when representation dimensions vary.

Abstract

Recent work has compared neural network representations via similarity-based analyses to improve model interpretation. The quality of a similarity measure is typically evaluated by its success in assigning a high score to representations that are expected to be matched. However, existing similarity measures perform mediocrely on standard benchmarks. In this work, we develop a new similarity measure, dubbed ContraSim, based on contrastive learning. In contrast to common closed-form similarity measures, ContraSim learns a parameterized measure by using both similar and dissimilar examples. We perform an extensive experimental evaluation of our method, with both language and vision models, on the standard layer prediction benchmark and two new benchmarks that we introduce: the multilingual benchmark and the image-caption benchmark. In all cases, ContraSim achieves much higher accuracy than previous similarity measures, even when presented with challenging examples. Finally, ContraSim is more suitable for the analysis of neural networks, revealing new insights not captured by previous measures.
Paper Structure (48 sections, 10 equations, 5 figures, 9 tables)

This paper contains 48 sections, 10 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Layer prediction benchmark. Given two models differing only in weight initialization, A and B, for each layer in the first model, among all layers of the second model, the highest similarity should be assigned to the architecturally-corresponding layer.
  • Figure 2: The multilingual benchmark. $r_1^E$ and $r_1^G$ denote the representations of the same sentence in different languages, and $S_1$ is their similarity. $r_2^E$ represents the random sentence representation, and $S_2$ is the similarity between it and $r_1^G$. We expect $S_1$ to be higher than $S_2$.
  • Figure 3: The image--caption benchmark. $r_1^C$ and $r^i$ denote the representations of the caption and the image pair, respectively, and $S_1$ is their similarity. $r_2^C$ denotes the random caption representation, and $S_2$ is the similarity between it and $r^i$. $S_1$ should be greater than $S_2$.
  • Figure 4: Original representations (left) are clustered by the source language (by shape). ContraSim (right) projects representations of the same sentence in different languages close by (by color).
  • Figure 5: Image--caption benchmark results for 4 different model pairs. ContraSim works best, and is the only measure robust to FAISS sampling.