Table of Contents
Fetching ...

Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini

TL;DR

The paper tackles efficient approximate retrieval over learned sparse embeddings by introducing Seismic, a block-based inverted-index design that leverages the concentration of importance in sparse representations. By partitioning inverted lists into geometrically coherent blocks with per-block summary vectors and coupling static pruning with a forward index, Seismic achieves sub-millisecond latency and substantial speedups over state-of-the-art baselines on MS MARCO embeddings, while preserving high recall. key contributions include the empirical observation of concentration of importance, the block-wise index architecture with summary-based pruning, and extensive ablations across multiple datasets and LSR models. The work demonstrates that learned sparse retrieval can be both fast and effective, enabling scalable first-stage retrieval for contextual embeddings and informing future directions in compression and architecture for sparse-vector search.

Abstract

Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models of relevance and are interpretable by design. Despite their apparent compatibility with inverted indexes, however, retrieval over sparse embeddings remains challenging. That is due to the distributional differences between learned embeddings and term frequency-based lexical models of relevance such as BM25. Recognizing this challenge, a great deal of research has gone into, among other things, designing retrieval algorithms tailored to the properties of learned sparse representations, including approximate retrieval systems. In fact, this task featured prominently in the latest BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on a large benchmark dataset by throughput and recall. In this work, we propose a novel organization of the inverted index that enables fast yet effective approximate retrieval over learned sparse embeddings. Our approach organizes inverted lists into geometrically-cohesive blocks, each equipped with a summary vector. During query processing, we quickly determine if a block must be evaluated using the summaries. As we show experimentally, single-threaded query processing using our method, Seismic, reaches sub-millisecond per-query latency on various sparse embeddings of the MS MARCO dataset while maintaining high recall. Our results indicate that Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and further outperforms the winning (graph-based) submissions to the BigANN Challenge by a significant margin.

Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

TL;DR

The paper tackles efficient approximate retrieval over learned sparse embeddings by introducing Seismic, a block-based inverted-index design that leverages the concentration of importance in sparse representations. By partitioning inverted lists into geometrically coherent blocks with per-block summary vectors and coupling static pruning with a forward index, Seismic achieves sub-millisecond latency and substantial speedups over state-of-the-art baselines on MS MARCO embeddings, while preserving high recall. key contributions include the empirical observation of concentration of importance, the block-wise index architecture with summary-based pruning, and extensive ablations across multiple datasets and LSR models. The work demonstrates that learned sparse retrieval can be both fast and effective, enabling scalable first-stage retrieval for contextual embeddings and informing future directions in compression and architecture for sparse-vector search.

Abstract

Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models of relevance and are interpretable by design. Despite their apparent compatibility with inverted indexes, however, retrieval over sparse embeddings remains challenging. That is due to the distributional differences between learned embeddings and term frequency-based lexical models of relevance such as BM25. Recognizing this challenge, a great deal of research has gone into, among other things, designing retrieval algorithms tailored to the properties of learned sparse representations, including approximate retrieval systems. In fact, this task featured prominently in the latest BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on a large benchmark dataset by throughput and recall. In this work, we propose a novel organization of the inverted index that enables fast yet effective approximate retrieval over learned sparse embeddings. Our approach organizes inverted lists into geometrically-cohesive blocks, each equipped with a summary vector. During query processing, we quickly determine if a block must be evaluated using the summaries. As we show experimentally, single-threaded query processing using our method, Seismic, reaches sub-millisecond per-query latency on various sparse embeddings of the MS MARCO dataset while maintaining high recall. Our results indicate that Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and further outperforms the winning (graph-based) submissions to the BigANN Challenge by a significant margin.
Paper Structure (20 sections, 3 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 20 sections, 3 equations, 7 figures, 2 tables, 2 algorithms.

Figures (7)

  • Figure 1: Fraction of $L_1$ mass preserved by keeping only the top non-zero entries with the largest absolute value.
  • Figure 2: Fraction of inner product (with $95\%$ confidence intervals) preserved by inner product between the top query and document coordinates with the largest absolute value.
  • Figure 3: The design of Seismic. Inverted lists are independently partitioned into geometrically-cohesive blocks. Each block is a set of document identifiers with a summary vector. The inner product of a query with the summary approximates the inner product attainable with the documents in that block. The forward index stores the complete vectors (including values).
  • Figure 4: MRR@10 on Ms Marco.
  • Figure 5: Fixed vs. geometric blocking. Data sampled from parameters: $\textsf{cut} \in \{1,\ldots, 10\}$ and $\textsf{heap\_factor} \in \{0.7, 0.8, 0.9, 1.0\}$.
  • ...and 2 more figures

Theorems & Definitions (1)

  • definition 1: $\alpha$-mass subvector