Scaling Laws For Dense Retrieval
Yan Fang, Jingtao Zhan, Qingyao Ai, Jiaxin Mao, Weihang Su, Jia Chen, Yiqun Liu
TL;DR
This paper demonstrates that dense retrieval models exhibit clear power-law scaling with respect to both model size and annotated data size when evaluated with a continuous metric, contrastive entropy. By systematically varying pre-trained backbones and annotation strategies on MSMARCO and T2Ranking, the authors establish model-size and data-size scaling laws and introduce a joint law to capture their interaction, enabling budget-aware resource planning. They further show how annotation quality modulates scaling and propose an application to optimize training under cost constraints, including the impact of inference costs. The work provides a practical framework for predicting DR performance, guiding data collection, model selection, and annotation strategies, while outlining limitations and directions for expanding scaling analyses to broader architectures and domains.
Abstract
Scaling up neural models has yielded significant advancements in a wide array of tasks, particularly in language generation. Previous studies have found that the performance of neural models frequently adheres to predictable scaling laws, correlated with factors such as training set size and model size. This insight is invaluable, especially as large-scale experiments grow increasingly resource-intensive. Yet, such scaling law has not been fully explored in dense retrieval due to the discrete nature of retrieval metrics and complex relationships between training data and model sizes in retrieval tasks. In this study, we investigate whether the performance of dense retrieval models follows the scaling law as other neural models. We propose to use contrastive log-likelihood as the evaluation metric and conduct extensive experiments with dense retrieval models implemented with different numbers of parameters and trained with different amounts of annotated data. Results indicate that, under our settings, the performance of dense retrieval models follows a precise power-law scaling related to the model size and the number of annotations. Additionally, we examine scaling with prevalent data augmentation methods to assess the impact of annotation quality, and apply the scaling law to find the best resource allocation strategy under a budget constraint. We believe that these insights will significantly contribute to understanding the scaling effect of dense retrieval models and offer meaningful guidance for future research endeavors.
