Table of Contents
Fetching ...

Exploring $\ell_0$ Sparsification for Inference-free Sparse Retrievers

Xinjie Shen, Zhichao Geng, Yang Yang

TL;DR

This work tackles efficient inference-free sparse retrieval by proposing two $\ell_0$-inspired sparsification techniques—$\ell_0$ Mask Loss and $\ell_0$ Approximation Activation—that selectively regularize the document-side representations. By conducting BEIR out-of-domain evaluations after finetuning on MS MARCO and indexing with OpenSearch, the approach achieves state-of-the-art results among inference-free sparse retrievers and is competitive with leading Siamese sparse models. The study also analyzes the efficiency–performance trade-offs across FLOPS penalties, masking thresholds, and activation counts, offering practical guidance for real-world deployment. Overall, the methods deliver substantial efficiency gains while maintaining strong retrieval quality in a challenging, inference-free setting.

Abstract

With increasing demands for efficiency, information retrieval has developed a branch of sparse retrieval, further advancing towards inference-free retrieval where the documents are encoded during indexing time and there is no model-inference for queries. Existing sparse retrieval models rely on FLOPS regularization for sparsification, while this mechanism was originally designed for Siamese encoders, it is considered to be suboptimal in inference-free scenarios which is asymmetric. Previous attempts to adapt FLOPS for inference-free scenarios have been limited to rule-based methods, leaving the potential of sparsification approaches for inference-free retrieval models largely unexplored. In this paper, we explore $\ell_0$ inspired sparsification manner for inference-free retrievers. Through comprehensive out-of-domain evaluation on the BEIR benchmark, our method achieves state-of-the-art performance among inference-free sparse retrieval models and is comparable to leading Siamese sparse retrieval models. Furthermore, we provide insights into the trade-off between retrieval effectiveness and computational efficiency, demonstrating practical value for real-world applications.

Exploring $\ell_0$ Sparsification for Inference-free Sparse Retrievers

TL;DR

This work tackles efficient inference-free sparse retrieval by proposing two -inspired sparsification techniques— Mask Loss and Approximation Activation—that selectively regularize the document-side representations. By conducting BEIR out-of-domain evaluations after finetuning on MS MARCO and indexing with OpenSearch, the approach achieves state-of-the-art results among inference-free sparse retrievers and is competitive with leading Siamese sparse models. The study also analyzes the efficiency–performance trade-offs across FLOPS penalties, masking thresholds, and activation counts, offering practical guidance for real-world deployment. Overall, the methods deliver substantial efficiency gains while maintaining strong retrieval quality in a challenging, inference-free setting.

Abstract

With increasing demands for efficiency, information retrieval has developed a branch of sparse retrieval, further advancing towards inference-free retrieval where the documents are encoded during indexing time and there is no model-inference for queries. Existing sparse retrieval models rely on FLOPS regularization for sparsification, while this mechanism was originally designed for Siamese encoders, it is considered to be suboptimal in inference-free scenarios which is asymmetric. Previous attempts to adapt FLOPS for inference-free scenarios have been limited to rule-based methods, leaving the potential of sparsification approaches for inference-free retrieval models largely unexplored. In this paper, we explore inspired sparsification manner for inference-free retrievers. Through comprehensive out-of-domain evaluation on the BEIR benchmark, our method achieves state-of-the-art performance among inference-free sparse retrieval models and is comparable to leading Siamese sparse retrieval models. Furthermore, we provide insights into the trade-off between retrieval effectiveness and computational efficiency, demonstrating practical value for real-world applications.

Paper Structure

This paper contains 20 sections, 5 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Search relevance vs efficiency, varying $\lambda_d$.
  • Figure 2: $\lambda_d$ vs encoded document sparsity. "x" denotes that baseline model collapses during training at $\lambda_d=0.12$.
  • Figure 3: Search relevance vs sparsity, varying $t$.