Table of Contents
Fetching ...

SPLATE: Sparse Late Interaction Retrieval

Thibault Formal, Stéphane Clinchant, Hervé Déjean, Carlos Lassance

TL;DR

This work addresses the computational burden of late-interactionIR pipelines by bridging dense ColBERTv2 representations with sparse retrieval. It introduces SPLATE, a lightweight MLM-adapter that maps frozen ColBERTv2 embeddings to sparse BoW vectors via a residual two-layer MLP, enabling efficient candidate generation on traditional inverted indexes while preserving powerful re-ranking with the same MaxSim mechanism. Trained with distillation on MS MARCO, SPLATE achieves competitive end-to-end performance with a much smaller candidate set (roughly 50 documents), and demonstrates strong in-domain and out-of-domain results across MS MARCO, BEIR, and LoTTE. The findings offer a practical, CPU-friendly route for deploying dense IR models within existing sparse retrieval infrastructures, and open avenues for further exploration of cross-representation transfer between dense and sparse IR.

Abstract

The late interaction paradigm introduced with ColBERT stands out in the neural Information Retrieval space, offering a compelling effectiveness-efficiency trade-off across many benchmarks. Efficient late interaction retrieval is based on an optimized multi-step strategy, where an approximate search first identifies a set of candidate documents to re-rank exactly. In this work, we introduce SPLATE, a simple and lightweight adaptation of the ColBERTv2 model which learns an ``MLM adapter'', mapping its frozen token embeddings to a sparse vocabulary space with a partially learned SPLADE module. This allows us to perform the candidate generation step in late interaction pipelines with traditional sparse retrieval techniques, making it particularly appealing for running ColBERT in CPU environments. Our SPLATE ColBERTv2 pipeline achieves the same effectiveness as the PLAID ColBERTv2 engine by re-ranking 50 documents that can be retrieved under 10ms.

SPLATE: Sparse Late Interaction Retrieval

TL;DR

This work addresses the computational burden of late-interactionIR pipelines by bridging dense ColBERTv2 representations with sparse retrieval. It introduces SPLATE, a lightweight MLM-adapter that maps frozen ColBERTv2 embeddings to sparse BoW vectors via a residual two-layer MLP, enabling efficient candidate generation on traditional inverted indexes while preserving powerful re-ranking with the same MaxSim mechanism. Trained with distillation on MS MARCO, SPLATE achieves competitive end-to-end performance with a much smaller candidate set (roughly 50 documents), and demonstrates strong in-domain and out-of-domain results across MS MARCO, BEIR, and LoTTE. The findings offer a practical, CPU-friendly route for deploying dense IR models within existing sparse retrieval infrastructures, and open avenues for further exploration of cross-representation transfer between dense and sparse IR.

Abstract

The late interaction paradigm introduced with ColBERT stands out in the neural Information Retrieval space, offering a compelling effectiveness-efficiency trade-off across many benchmarks. Efficient late interaction retrieval is based on an optimized multi-step strategy, where an approximate search first identifies a set of candidate documents to re-rank exactly. In this work, we introduce SPLATE, a simple and lightweight adaptation of the ColBERTv2 model which learns an ``MLM adapter'', mapping its frozen token embeddings to a sparse vocabulary space with a partially learned SPLADE module. This allows us to perform the candidate generation step in late interaction pipelines with traditional sparse retrieval techniques, making it particularly appealing for running ColBERT in CPU environments. Our SPLATE ColBERTv2 pipeline achieves the same effectiveness as the PLAID ColBERTv2 engine by re-ranking 50 documents that can be retrieved under 10ms.
Paper Structure (14 sections, 2 equations, 3 figures, 3 tables)

This paper contains 14 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: (Left) SPLATE relies on the same representations $(h_i)_{i \in t}$ to learn sparse BoW with SPLADE (candidate generation) and to compute late interactions (re-ranking). (Right) Inference: SPLATE ColBERTv2 maps the representations of the query tokens to a sparse vector, which is used to retrieve $k$ documents from a pre-computed sparse index (R setting). In the e2e setting, representations are gathered from the ColBERT index to re-rank the candidates exactly with MaxSim.
  • Figure 2: Candidate generation approximate accuracy on MS MARCO dev -- SPLATE (R). Dotted lines ($\blacksquare$) represent $R(10)$, solid lines represent (✖) $R(100)$.
  • Figure 3: Impact of $k$ and $(k_q,k_d)$ on SPLATE (e2e) ouf-of-domain performance -- $Success@5$ on LoTTE (test pooled Search). The orange line represents ColBERTv2.