Efficiency and Effectiveness of SPLADE Models on Billion-Scale Web Document Title
Taeryun Won, Tae Kwan Lee, Hiun Kim, Hyemin Lee
TL;DR
This study benchmarks BM25, SPLADE, and Expanded-SPLADE for billion-scale web-title retrieval, focusing on the trade-off between effectiveness and efficiency when using sparse lexical representations. It demonstrates that sparse models improve retrieval quality over BM25, with Expanded-SPLADE offering the most practical balance by combining competitive effectiveness with lower latency via pruning. The authors introduce document-centric pruning, top-k query term selection, and boolean thresholding to reduce compute, showing substantial latency gains with modest loss in retrieval quality. The results inform deployment of sparse retrieval in large-scale search engines, especially for languages with non-Latin scripts like Korean.
Abstract
This paper presents a comprehensive comparison of BM25, SPLADE, and Expanded-SPLADE models in the context of large-scale web document retrieval. We evaluate the effectiveness and efficiency of these models on datasets spanning from tens of millions to billions of web document titles. SPLADE and Expanded-SPLADE, which utilize sparse lexical representations, demonstrate superior retrieval performance compared to BM25, especially for complex queries. However, these models incur higher computational costs. We introduce pruning strategies, including document-centric pruning and top-k query term selection, boolean query with term threshold to mitigate these costs and improve the models' efficiency without significantly sacrificing retrieval performance. The results show that Expanded-SPLADE strikes the best balance between effectiveness and efficiency, particularly when handling large datasets. Our findings offer valuable insights for deploying sparse retrieval models in large-scale search engines.
