Table of Contents
Fetching ...

Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions

Jinsung Yoon, Raj Sinha, Sercan O Arik, Tomas Pfister

TL;DR

This work tackles the latency-cost bottleneck of high-dimensional LLM embeddings in information retrieval. It introduces Matryoshka-Adaptor, a tunable framework that morphs pre-trained embeddings into Matryoshka representations via unsupervised and supervised objectives. Key contributions include pairwise/top-k similarity losses, a skip-connection and reconstruction regularizer, a ranking loss for supervised data, and a two-stage training strategy, achieving roughly 2x unsupervised and 6x supervised dimensionality reductions with preserved BEIR/MIRACL/Fashion-200K performance. The approach is model- and API-agnostic, extends to multimodal and multilingual embeddings, and outperforms PCA and prior retrieval adapters in reducing latency while maintaining accuracy.

Abstract

Embeddings from Large Language Models (LLMs) have emerged as critical components in various applications, particularly for information retrieval. While high-dimensional embeddings generally demonstrate superior performance as they contain more salient information, their practical application is frequently hindered by elevated computational latency and the associated higher cost. To address these challenges, we propose Matryoshka-Adaptor, a novel tuning framework designed for the customization of LLM embeddings. Matryoshka-Adaptor facilitates substantial dimensionality reduction while maintaining comparable performance levels, thereby achieving a significant enhancement in computational efficiency and cost-effectiveness. Our framework directly modifies the embeddings from pre-trained LLMs which is designed to be seamlessly integrated with any LLM architecture, encompassing those accessible exclusively through black-box APIs. Also, it exhibits efficacy in both unsupervised and supervised learning settings. A rigorous evaluation conducted across a diverse corpus of English, multilingual, and multimodal datasets consistently reveals substantial gains with Matryoshka-Adaptor. Notably, with Google and OpenAI Embedding APIs, Matryoshka-Adaptor achieves a reduction in dimensionality ranging from two- to twelve-fold without compromising performance across multiple BEIR datasets.

Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions

TL;DR

This work tackles the latency-cost bottleneck of high-dimensional LLM embeddings in information retrieval. It introduces Matryoshka-Adaptor, a tunable framework that morphs pre-trained embeddings into Matryoshka representations via unsupervised and supervised objectives. Key contributions include pairwise/top-k similarity losses, a skip-connection and reconstruction regularizer, a ranking loss for supervised data, and a two-stage training strategy, achieving roughly 2x unsupervised and 6x supervised dimensionality reductions with preserved BEIR/MIRACL/Fashion-200K performance. The approach is model- and API-agnostic, extends to multimodal and multilingual embeddings, and outperforms PCA and prior retrieval adapters in reducing latency while maintaining accuracy.

Abstract

Embeddings from Large Language Models (LLMs) have emerged as critical components in various applications, particularly for information retrieval. While high-dimensional embeddings generally demonstrate superior performance as they contain more salient information, their practical application is frequently hindered by elevated computational latency and the associated higher cost. To address these challenges, we propose Matryoshka-Adaptor, a novel tuning framework designed for the customization of LLM embeddings. Matryoshka-Adaptor facilitates substantial dimensionality reduction while maintaining comparable performance levels, thereby achieving a significant enhancement in computational efficiency and cost-effectiveness. Our framework directly modifies the embeddings from pre-trained LLMs which is designed to be seamlessly integrated with any LLM architecture, encompassing those accessible exclusively through black-box APIs. Also, it exhibits efficacy in both unsupervised and supervised learning settings. A rigorous evaluation conducted across a diverse corpus of English, multilingual, and multimodal datasets consistently reveals substantial gains with Matryoshka-Adaptor. Notably, with Google and OpenAI Embedding APIs, Matryoshka-Adaptor achieves a reduction in dimensionality ranging from two- to twelve-fold without compromising performance across multiple BEIR datasets.
Paper Structure (37 sections, 6 equations, 17 figures, 5 tables)

This paper contains 37 sections, 6 equations, 17 figures, 5 tables.

Figures (17)

  • Figure 1: The effectiveness of the Matryoshka Adaptor in dimensionality reduction. In both unsupervised (red line) and supervised (black line) settings, the Matryoshka Adaptor showcases the capability to considerably decrease embedding dimensions while maintaining a negligible impact on nDCG@10 retrieval performance with BEIR SciFact dataset. Notably, at the same embedding dimensionality, the utilization of our approach results in significantly improved performance.
  • Figure 2: Similarity loss is a measure of the discrepancy between the similarity of two embeddings in their original dimensional space and their similarity in a reduced dimensional space. If the orange and blue embeddings are chosen randomly, this loss is referred to as pairwise similarity loss. If the orange and blue embeddings are selected based on similarity in their original dimensional space (top-k nearest embeddings), this loss is referred to as top-k similarity loss. Note that top-k similarity loss focuses on preserving local similarity relationships among neighboring embeddings.
  • Figure 3: Block diagrams illustrating both the unsupervised and supervised Matryoshka-Adaptor frameworks. Unsupervised Matryoshka-Adaptor: This variant exclusively utilizes corpus embeddings as input. The training of the adaptor is achieved through a combination of top-k similarity loss and pairwise loss, which are calculated across multiple Matryoshka embeddings with various reduced dimensions. Supervised Matryoshka-Adaptor: In this variant, query embeddings and query-corpus pairs are provided as supplementary inputs. A ranking loss is incorporated alongside the top-k and pairwise losses to facilitate the training of the adaptor. Similar to the unsupervised setting, all losses are computed across Matryoshka embeddings with various reduced dimensions.
  • Figure 4: Experimental results of the unsupervised Matryoshka-Adaptor applied to three different embedding models: OpenAI text-embedding-3-large (with 3072 dimensions), OpenAI text-embedding-3-small (with 1536 dimensions), and Google multimodal (with 1408 dimensions). Text embedding results were obtained using 8 BEIR datasets, while multimodal embedding results were obtained using 5 Fashion-200K datasets.
  • Figure 5: Experimental results of the supervised Matryoshka-Adaptor on retrieval tasks, utilizing three different embedding models: OpenAI text-embedding-3-large (on 8 BEIR datasets), Google Gecko multilingual (on 17 MIRACL datasets), and Google multimodal (on 5 Fashion-200K datasets). Additional results are in Appendix. \ref{['appx:additional_supervised results']}.
  • ...and 12 more figures