Table of Contents
Fetching ...

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant

TL;DR

This work tackles the challenge of efficient first-stage retrieval by enhancing sparse lexical representations. It introduces SPLADE, an end-to-end sparse model that uses log-saturation activation and FLOPS-based regularization to jointly learn query and document representations that expand terms while remaining highly sparse. SPLADE achieves competitive results with state-of-the-art dense methods on MS MARCO and outperforms prior sparse approaches, while offering explicit control over index size and computational cost. The approach enables scalable, inverted-index-friendly retrieval with tunable efficiency–effectiveness trade-offs, providing a practical alternative to dense retrieval in large collections.

Abstract

In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning sparse representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. In this work, we present a new first-stage ranker based on explicit sparsity regularization and a log-saturation effect on term weights, leading to highly sparse representations and competitive results with respect to state-of-the-art dense and sparse methods. Our approach is simple, trained end-to-end in a single stage. We also explore the trade-off between effectiveness and efficiency, by controlling the contribution of the sparsity regularization.

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

TL;DR

This work tackles the challenge of efficient first-stage retrieval by enhancing sparse lexical representations. It introduces SPLADE, an end-to-end sparse model that uses log-saturation activation and FLOPS-based regularization to jointly learn query and document representations that expand terms while remaining highly sparse. SPLADE achieves competitive results with state-of-the-art dense methods on MS MARCO and outperforms prior sparse approaches, while offering explicit control over index size and computational cost. The approach enables scalable, inverted-index-friendly retrieval with tunable efficiency–effectiveness trade-offs, providing a practical alternative to dense retrieval in large collections.

Abstract

In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning sparse representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. In this work, we present a new first-stage ranker based on explicit sparsity regularization and a log-saturation effect on term weights, leading to highly sparse representations and competitive results with respect to state-of-the-art dense and sparse methods. Our approach is simple, trained end-to-end in a single stage. We also explore the trade-off between effectiveness and efficiency, by controlling the contribution of the sparsity regularization.

Paper Structure

This paper contains 16 sections, 7 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Performance vs FLOPS for SPLADE models trained with different regularization strength $\lambda$ on MS MARCO