Exploring $\ell_0$ Sparsification for Inference-free Sparse Retrievers
Xinjie Shen, Zhichao Geng, Yang Yang
TL;DR
This work tackles efficient inference-free sparse retrieval by proposing two $\ell_0$-inspired sparsification techniques—$\ell_0$ Mask Loss and $\ell_0$ Approximation Activation—that selectively regularize the document-side representations. By conducting BEIR out-of-domain evaluations after finetuning on MS MARCO and indexing with OpenSearch, the approach achieves state-of-the-art results among inference-free sparse retrievers and is competitive with leading Siamese sparse models. The study also analyzes the efficiency–performance trade-offs across FLOPS penalties, masking thresholds, and activation counts, offering practical guidance for real-world deployment. Overall, the methods deliver substantial efficiency gains while maintaining strong retrieval quality in a challenging, inference-free setting.
Abstract
With increasing demands for efficiency, information retrieval has developed a branch of sparse retrieval, further advancing towards inference-free retrieval where the documents are encoded during indexing time and there is no model-inference for queries. Existing sparse retrieval models rely on FLOPS regularization for sparsification, while this mechanism was originally designed for Siamese encoders, it is considered to be suboptimal in inference-free scenarios which is asymmetric. Previous attempts to adapt FLOPS for inference-free scenarios have been limited to rule-based methods, leaving the potential of sparsification approaches for inference-free retrieval models largely unexplored. In this paper, we explore $\ell_0$ inspired sparsification manner for inference-free retrievers. Through comprehensive out-of-domain evaluation on the BEIR benchmark, our method achieves state-of-the-art performance among inference-free sparse retrieval models and is comparable to leading Siamese sparse retrieval models. Furthermore, we provide insights into the trade-off between retrieval effectiveness and computational efficiency, demonstrating practical value for real-world applications.
