Table of Contents
Fetching ...

Scaling Laws for Embedding Dimension in Information Retrieval

Julian Killingback, Mahta Rafiee, Madine Manas, Hamed Zamani

TL;DR

This work investigates how embedding dimension governs dense retrieval performance, revealing that contrastive entropy follows a power-law with respect to embedding size and diminishing returns. By analyzing two model families (BERT and Ettin) across varied sizes and embedding dimensions, the authors derive a dimension-only scaling law and a joint law that also incorporates model size, enabling prediction of retrieval performance under fixed compute budgets. The findings show that larger models with smaller embeddings can often outperform smaller models with larger embeddings, especially under practical scoring constraints, and that scaling behavior is more predictable for in-domain tasks than for out-of-domain tasks. The study provides practical design guidelines for building efficient, scalable dense retrievers and points to future work on sparse representations and training strategies to better utilize high-dimensional embedding spaces.

Abstract

Dense retrieval, which encodes queries and documents into a single dense vector, has become the dominant neural retrieval approach due to its simplicity and compatibility with fast approximate nearest neighbor algorithms. As the tasks dense retrieval performs grow in complexity, the fundamental limitations of the underlying data structure and similarity metric -- namely vectors and inner-products -- become more apparent. Prior recent work has shown theoretical limitations inherent to single vectors and inner-products that are generally tied to the embedding dimension. Given the importance of embedding dimension for retrieval capacity, understanding how dense retrieval performance changes as embedding dimension is scaled is fundamental to building next generation retrieval models that balance effectiveness and efficiency. In this work, we conduct a comprehensive analysis of the relationship between embedding dimension and retrieval performance. Our experiments include two model families and a range of model sizes from each to construct a detailed picture of embedding scaling behavior. We find that the scaling behavior fits a power law, allowing us to derive scaling laws for performance given only embedding dimension, as well as a joint law accounting for embedding dimension and model size. Our analysis shows that for evaluation tasks aligned with the training task, performance continues to improve as embedding size increases, though with diminishing returns. For evaluation data that is less aligned with the training task, we find that performance is less predictable, with performance degrading with larger embedding dimensions for certain tasks. We hope our work provides additional insight into the limitations of embeddings and their behavior as well as offers a practical guide for selecting model and embedding dimension to achieve optimal performance with reduced storage and compute costs.

Scaling Laws for Embedding Dimension in Information Retrieval

TL;DR

This work investigates how embedding dimension governs dense retrieval performance, revealing that contrastive entropy follows a power-law with respect to embedding size and diminishing returns. By analyzing two model families (BERT and Ettin) across varied sizes and embedding dimensions, the authors derive a dimension-only scaling law and a joint law that also incorporates model size, enabling prediction of retrieval performance under fixed compute budgets. The findings show that larger models with smaller embeddings can often outperform smaller models with larger embeddings, especially under practical scoring constraints, and that scaling behavior is more predictable for in-domain tasks than for out-of-domain tasks. The study provides practical design guidelines for building efficient, scalable dense retrievers and points to future work on sparse representations and training strategies to better utilize high-dimensional embedding spaces.

Abstract

Dense retrieval, which encodes queries and documents into a single dense vector, has become the dominant neural retrieval approach due to its simplicity and compatibility with fast approximate nearest neighbor algorithms. As the tasks dense retrieval performs grow in complexity, the fundamental limitations of the underlying data structure and similarity metric -- namely vectors and inner-products -- become more apparent. Prior recent work has shown theoretical limitations inherent to single vectors and inner-products that are generally tied to the embedding dimension. Given the importance of embedding dimension for retrieval capacity, understanding how dense retrieval performance changes as embedding dimension is scaled is fundamental to building next generation retrieval models that balance effectiveness and efficiency. In this work, we conduct a comprehensive analysis of the relationship between embedding dimension and retrieval performance. Our experiments include two model families and a range of model sizes from each to construct a detailed picture of embedding scaling behavior. We find that the scaling behavior fits a power law, allowing us to derive scaling laws for performance given only embedding dimension, as well as a joint law accounting for embedding dimension and model size. Our analysis shows that for evaluation tasks aligned with the training task, performance continues to improve as embedding size increases, though with diminishing returns. For evaluation data that is less aligned with the training task, we find that performance is less predictable, with performance degrading with larger embedding dimensions for certain tasks. We hope our work provides additional insight into the limitations of embeddings and their behavior as well as offers a practical guide for selecting model and embedding dimension to achieve optimal performance with reduced storage and compute costs.
Paper Structure (27 sections, 13 equations, 7 figures, 2 tables)

This paper contains 27 sections, 13 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Scaling behavior of contrastive entropy relative to embedding dimension on TREC DL Combined and MSMARCO Dev. The points represent the observed contrastive entropy while the line represents the fitted dimension-only scaling law. The BERT model shown in both plots is BERT-L8-H512-A8. The Ettin model shown in both plots is Ettin L19-H512-A8.
  • Figure 2: Empirical results and joint scaling laws for the BERT model family on MSMARCO Dev and TREC DL Combined. The points represent empirical results at various embedding dimensions and model sizes which are represented by the point colors. The dashed lines represent the joint scaling laws fit on the observed data.
  • Figure 3: Empirical results and joint scaling laws for the Ettin model family on MSMARCO Dev and TREC DL Combined. The points represent empirical results at various embedding dimensions and model sizes which are represented by the point colors. The dashed lines represent the joint scaling laws fit on the observed data.
  • Figure 4: Out-of-domain evaluation on Paper Retrieval and Legal QA for BERT and Ettin families.
  • Figure 5: Scaling behavior based on ranking metrics (RR@10 and R@1000) for BERT model family on MSMARCO Dev.
  • ...and 2 more figures