Table of Contents
Fetching ...

Map of Encoders -- Mapping Sentence Encoders using Quantum Relative Entropy

Gaifan Zhang, Danushka Bollegala

TL;DR

This work addresses the challenge of comparing and visualizing a vast space of pre-trained sentence encoders. It introduces a quantum-inspired pipeline that represents each encoder as a density matrix derived from a fixed embedding set, computes a QRE-based feature vector relative to a unit base encoder, and visualizes encoders with TSNE in a shared space. The method yields meaningful groupings by architecture, data, and task fine-tuning, and demonstrates strong correlations with downstream retrieval and clustering performance, enabling inference for novel encoders. The approach offers a scalable, dimensionally agnostic map of encoders with practical utility for model selection and understanding the encoder landscape, while acknowledging dataset and language limitations and the need for broader multilingual validation.

Abstract

We propose a method to compare and visualise sentence encoders at scale by creating a map of encoders where each sentence encoder is represented in relation to the other sentence encoders. Specifically, we first represent a sentence encoder using an embedding matrix of a sentence set, where each row corresponds to the embedding of a sentence. Next, we compute the Pairwise Inner Product (PIP) matrix for a sentence encoder using its embedding matrix. Finally, we create a feature vector for each sentence encoder reflecting its Quantum Relative Entropy (QRE) with respect to a unit base encoder. We construct a map of encoders covering 1101 publicly available sentence encoders, providing a new perspective of the landscape of the pre-trained sentence encoders. Our map accurately reflects various relationships between encoders, where encoders with similar attributes are proximally located on the map. Moreover, our encoder feature vectors can be used to accurately infer downstream task performance of the encoders, such as in retrieval and clustering tasks, demonstrating the faithfulness of our map.

Map of Encoders -- Mapping Sentence Encoders using Quantum Relative Entropy

TL;DR

This work addresses the challenge of comparing and visualizing a vast space of pre-trained sentence encoders. It introduces a quantum-inspired pipeline that represents each encoder as a density matrix derived from a fixed embedding set, computes a QRE-based feature vector relative to a unit base encoder, and visualizes encoders with TSNE in a shared space. The method yields meaningful groupings by architecture, data, and task fine-tuning, and demonstrates strong correlations with downstream retrieval and clustering performance, enabling inference for novel encoders. The approach offers a scalable, dimensionally agnostic map of encoders with practical utility for model selection and understanding the encoder landscape, while acknowledging dataset and language limitations and the need for broader multilingual validation.

Abstract

We propose a method to compare and visualise sentence encoders at scale by creating a map of encoders where each sentence encoder is represented in relation to the other sentence encoders. Specifically, we first represent a sentence encoder using an embedding matrix of a sentence set, where each row corresponds to the embedding of a sentence. Next, we compute the Pairwise Inner Product (PIP) matrix for a sentence encoder using its embedding matrix. Finally, we create a feature vector for each sentence encoder reflecting its Quantum Relative Entropy (QRE) with respect to a unit base encoder. We construct a map of encoders covering 1101 publicly available sentence encoders, providing a new perspective of the landscape of the pre-trained sentence encoders. Our map accurately reflects various relationships between encoders, where encoders with similar attributes are proximally located on the map. Moreover, our encoder feature vectors can be used to accurately infer downstream task performance of the encoders, such as in retrieval and clustering tasks, demonstrating the faithfulness of our map.
Paper Structure (31 sections, 2 theorems, 59 equations, 12 figures, 4 tables)

This paper contains 31 sections, 2 theorems, 59 equations, 12 figures, 4 tables.

Key Result

theorem 1

Let $\mat{\rho}$ be the density matrix of an encoder with non-zero orthonormal eigenpairs $\{(\lambda_i, \vec{v}_i)\}_{i=1}^{K_{\rho}}$. Let $\mat{\sigma}$ be the density matrix of a different encoder with non-zero orthonormal eigenpairs $\{(\mu_j, \vec{u}_j)\}_{j=1}^{K_{\sigma}}$. By perturbating $ where $c_i = \sum_{j=1}^{K_{\sigma}} (\vec{v}_i\T \vec{u}_j)^2$ is the captured mass, $r_i = 1 - c_

Figures (12)

  • Figure 1: Map of Encoders. Top: Map for 1101 sentence encoders, visualised by t-SNE and coloured by the encoder type. Bottom: A 30% zoomed-in view of the dotted area showing the top-7 nearest neighbours for https://huggingface.co/BAAI/bge-base-en-v1.5 and https://huggingface.co/BAAI/bge-large-en-v1.5. We see that the nearest neighbours belong to the same primary architecture and are closely located.
  • Figure 2: An illustrative example of the visualisation of QRE-based feature vectors. We use the unit base encoder with 10,000 basis vectors and generate two groups of synthetic embeddings with low noise ($\sigma^2 \in [0,1]$) and high noise ($\sigma^2 \in [3, 4]$), respectively sampled from the normal distribution, $\mathcal{N}(0, \sigma^2 \mathbf{I})$. Each group has 10 embedding matrices. We use our proposed QRE method to compute the feature vectors and visualise them using t-SNE. This example clearly shows that the low- and high-noise groups are separated by distinguishable QRE values (sum of feature vectors), validating the faithfulness of our method. See \ref{['sec:illu']} for details.
  • Figure 3: Maps by attributes.
  • Figure 4: Hierarchical clustering of the top 100 downloaded encoders, coloured by encoder type. $\ell_1$ values are reported in log scale for better visualisation. A zoomed-in version shown in \ref{['sec:zoomin_den']}.
  • Figure 5: Submap of 112 encoders coloured by the average Spearman correlation between the true and predicted task performance of selected tasks.
  • ...and 7 more figures

Theorems & Definitions (5)

  • theorem 1
  • theorem 2
  • proof
  • proof
  • proof