Table of Contents
Fetching ...

BM25S: Orders of magnitude faster lexical search via eager sparse scoring

Xing Han Lù

TL;DR

BM25S introduces an eager index-time scoring approach that precomputes nonzero BM25 contributions and stores them in a SciPy sparse matrix, yielding orders-of-magnitude speedups over Python-based baselines. By reformulating the BM25 score and leveraging CSC sparse storage, it eliminates costly per-query scoring and enables efficient top-k retrieval with minimal dependencies. The method accommodates multiple BM25 variants and offers practical advantages for edge deployment and reproducible research, achieving substantial throughput gains on BEIR benchmarks. The work also provides a careful analysis of tokenization’s impact and variant-level performance, outlining practical trade-offs and deployment considerations.

Abstract

We introduce BM25S, an efficient Python-based implementation of BM25 that only depends on Numpy and Scipy. BM25S achieves up to a 500x speedup compared to the most popular Python-based framework by eagerly computing BM25 scores during indexing and storing them into sparse matrices. It also achieves considerable speedups compared to highly optimized Java-based implementations, which are used by popular commercial products. Finally, BM25S reproduces the exact implementation of five BM25 variants based on Kamphuis et al. (2020) by extending eager scoring to non-sparse variants using a novel score shifting method. The code can be found at https://github.com/xhluca/bm25s

BM25S: Orders of magnitude faster lexical search via eager sparse scoring

TL;DR

BM25S introduces an eager index-time scoring approach that precomputes nonzero BM25 contributions and stores them in a SciPy sparse matrix, yielding orders-of-magnitude speedups over Python-based baselines. By reformulating the BM25 score and leveraging CSC sparse storage, it eliminates costly per-query scoring and enables efficient top-k retrieval with minimal dependencies. The method accommodates multiple BM25 variants and offers practical advantages for edge deployment and reproducible research, achieving substantial throughput gains on BEIR benchmarks. The work also provides a careful analysis of tokenization’s impact and variant-level performance, outlining practical trade-offs and deployment considerations.

Abstract

We introduce BM25S, an efficient Python-based implementation of BM25 that only depends on Numpy and Scipy. BM25S achieves up to a 500x speedup compared to the most popular Python-based framework by eagerly computing BM25 scores during indexing and storing them into sparse matrices. It also achieves considerable speedups compared to highly optimized Java-based implementations, which are used by popular commercial products. Finally, BM25S reproduces the exact implementation of five BM25 variants based on Kamphuis et al. (2020) by extending eager scoring to non-sparse variants using a novel score shifting method. The code can be found at https://github.com/xhluca/bm25s
Paper Structure (19 sections, 5 equations, 3 tables)