SPLATE: Sparse Late Interaction Retrieval
Thibault Formal, Stéphane Clinchant, Hervé Déjean, Carlos Lassance
TL;DR
This work addresses the computational burden of late-interactionIR pipelines by bridging dense ColBERTv2 representations with sparse retrieval. It introduces SPLATE, a lightweight MLM-adapter that maps frozen ColBERTv2 embeddings to sparse BoW vectors via a residual two-layer MLP, enabling efficient candidate generation on traditional inverted indexes while preserving powerful re-ranking with the same MaxSim mechanism. Trained with distillation on MS MARCO, SPLATE achieves competitive end-to-end performance with a much smaller candidate set (roughly 50 documents), and demonstrates strong in-domain and out-of-domain results across MS MARCO, BEIR, and LoTTE. The findings offer a practical, CPU-friendly route for deploying dense IR models within existing sparse retrieval infrastructures, and open avenues for further exploration of cross-representation transfer between dense and sparse IR.
Abstract
The late interaction paradigm introduced with ColBERT stands out in the neural Information Retrieval space, offering a compelling effectiveness-efficiency trade-off across many benchmarks. Efficient late interaction retrieval is based on an optimized multi-step strategy, where an approximate search first identifies a set of candidate documents to re-rank exactly. In this work, we introduce SPLATE, a simple and lightweight adaptation of the ColBERTv2 model which learns an ``MLM adapter'', mapping its frozen token embeddings to a sparse vocabulary space with a partially learned SPLADE module. This allows us to perform the candidate generation step in late interaction pipelines with traditional sparse retrieval techniques, making it particularly appealing for running ColBERT in CPU environments. Our SPLATE ColBERTv2 pipeline achieves the same effectiveness as the PLAID ColBERTv2 engine by re-ranking 50 documents that can be retrieved under 10ms.
