Table of Contents
Fetching ...

Efficient Constant-Space Multi-Vector Retrieval

Sean MacAvaney, Antonio Mallia, Nicola Tonellotto

TL;DR

Efficient Constant-Space Multi-Vector Retrieval introduces ConstBERT, a method to compress multi-vector document representations by projecting token-level embeddings into a fixed set of $C$ document vectors. The approach maintains retrieval effectiveness close to ColBERT while dramatically reducing index size and memory usage, thanks to fixed-size per-document embeddings and a learned pooling mechanism. It supports end-to-end training and integrates with reranking workflows, showing strong results on MSMARCO and BEIR benchmarks with substantial storage savings. This provides a practical path toward scalable, low-latency multi-vector retrieval suitable for real-world deployment where storage and latency are critical.

Abstract

Multi-vector retrieval methods, exemplified by the ColBERT architecture, have shown substantial promise for retrieval by providing strong trade-offs in terms of retrieval latency and effectiveness. However, they come at a high cost in terms of storage since a (potentially compressed) vector needs to be stored for every token in the input collection. To overcome this issue, we propose encoding documents to a fixed number of vectors, which are no longer necessarily tied to the input tokens. Beyond reducing the storage costs, our approach has the advantage that document representations become of a fixed size on disk, allowing for better OS paging management. Through experiments using the MSMARCO passage corpus and BEIR with the ColBERT-v2 architecture, a representative multi-vector ranking model architecture, we find that passages can be effectively encoded into a fixed number of vectors while retaining most of the original effectiveness.

Efficient Constant-Space Multi-Vector Retrieval

TL;DR

Efficient Constant-Space Multi-Vector Retrieval introduces ConstBERT, a method to compress multi-vector document representations by projecting token-level embeddings into a fixed set of document vectors. The approach maintains retrieval effectiveness close to ColBERT while dramatically reducing index size and memory usage, thanks to fixed-size per-document embeddings and a learned pooling mechanism. It supports end-to-end training and integrates with reranking workflows, showing strong results on MSMARCO and BEIR benchmarks with substantial storage savings. This provides a practical path toward scalable, low-latency multi-vector retrieval suitable for real-world deployment where storage and latency are critical.

Abstract

Multi-vector retrieval methods, exemplified by the ColBERT architecture, have shown substantial promise for retrieval by providing strong trade-offs in terms of retrieval latency and effectiveness. However, they come at a high cost in terms of storage since a (potentially compressed) vector needs to be stored for every token in the input collection. To overcome this issue, we propose encoding documents to a fixed number of vectors, which are no longer necessarily tied to the input tokens. Beyond reducing the storage costs, our approach has the advantage that document representations become of a fixed size on disk, allowing for better OS paging management. Through experiments using the MSMARCO passage corpus and BEIR with the ColBERT-v2 architecture, a representative multi-vector ranking model architecture, we find that passages can be effectively encoded into a fixed number of vectors while retaining most of the original effectiveness.

Paper Structure

This paper contains 12 sections, 3 equations, 3 tables.