Map of Encoders -- Mapping Sentence Encoders using Quantum Relative Entropy
Gaifan Zhang, Danushka Bollegala
TL;DR
This work addresses the challenge of comparing and visualizing a vast space of pre-trained sentence encoders. It introduces a quantum-inspired pipeline that represents each encoder as a density matrix derived from a fixed embedding set, computes a QRE-based feature vector relative to a unit base encoder, and visualizes encoders with TSNE in a shared space. The method yields meaningful groupings by architecture, data, and task fine-tuning, and demonstrates strong correlations with downstream retrieval and clustering performance, enabling inference for novel encoders. The approach offers a scalable, dimensionally agnostic map of encoders with practical utility for model selection and understanding the encoder landscape, while acknowledging dataset and language limitations and the need for broader multilingual validation.
Abstract
We propose a method to compare and visualise sentence encoders at scale by creating a map of encoders where each sentence encoder is represented in relation to the other sentence encoders. Specifically, we first represent a sentence encoder using an embedding matrix of a sentence set, where each row corresponds to the embedding of a sentence. Next, we compute the Pairwise Inner Product (PIP) matrix for a sentence encoder using its embedding matrix. Finally, we create a feature vector for each sentence encoder reflecting its Quantum Relative Entropy (QRE) with respect to a unit base encoder. We construct a map of encoders covering 1101 publicly available sentence encoders, providing a new perspective of the landscape of the pre-trained sentence encoders. Our map accurately reflects various relationships between encoders, where encoders with similar attributes are proximally located on the map. Moreover, our encoder feature vectors can be used to accurately infer downstream task performance of the encoders, such as in retrieval and clustering tasks, demonstrating the faithfulness of our map.
