Table of Contents
Fetching ...

Rethinking the Role of Token Retrieval in Multi-Vector Retrieval

Jinhyuk Lee, Zhuyun Dai, Sai Meher Karthik Duddu, Tao Lei, Iftekhar Naim, Ming-Wei Chang, Vincent Y. Zhao

TL;DR

This work tackles the inefficiency of multi-vector retrieval pipelines by rethinking token retrieval. It introduces XTR, a contextualized token retriever trained with an in-batch objective to prioritize the most informative document tokens, enabling scoring solely over retrieved tokens and removing the costly gathering stage. The approach yields substantial reductions in scoring FLOPs (roughly 4000x) while achieving state-of-the-art results on BEIR and LoTTE without distillation, and strong performance on MS MARCO and MIRACL in multilingual settings. Analyses show improved gold-token recall and contextual retrieval quality, validating the central claim that better token retrieval can drive competitive or superior ranking with a simpler, cheaper scoring stage.

Abstract

Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents, and hence achieve state of the art on many information retrieval benchmarks. However, their non-linear scoring function cannot be scaled to millions of documents, necessitating a three-stage process for inference: retrieving initial candidates via token retrieval, accessing all token vectors, and scoring the initial candidate documents. The non-linear scoring function is applied over all token vectors of each candidate document, making the inference process complicated and slow. In this paper, we aim to simplify the multi-vector retrieval by rethinking the role of token retrieval. We present XTR, ConteXtualized Token Retriever, which introduces a simple, yet novel, objective function that encourages the model to retrieve the most important document tokens first. The improvement to token retrieval allows XTR to rank candidates only using the retrieved tokens rather than all tokens in the document, and enables a newly designed scoring stage that is two-to-three orders of magnitude cheaper than that of ColBERT. On the popular BEIR benchmark, XTR advances the state-of-the-art by 2.8 nDCG@10 without any distillation. Detailed analysis confirms our decision to revisit the token retrieval stage, as XTR demonstrates much better recall of the token retrieval stage compared to ColBERT.

Rethinking the Role of Token Retrieval in Multi-Vector Retrieval

TL;DR

This work tackles the inefficiency of multi-vector retrieval pipelines by rethinking token retrieval. It introduces XTR, a contextualized token retriever trained with an in-batch objective to prioritize the most informative document tokens, enabling scoring solely over retrieved tokens and removing the costly gathering stage. The approach yields substantial reductions in scoring FLOPs (roughly 4000x) while achieving state-of-the-art results on BEIR and LoTTE without distillation, and strong performance on MS MARCO and MIRACL in multilingual settings. Analyses show improved gold-token recall and contextual retrieval quality, validating the central claim that better token retrieval can drive competitive or superior ranking with a simpler, cheaper scoring stage.

Abstract

Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents, and hence achieve state of the art on many information retrieval benchmarks. However, their non-linear scoring function cannot be scaled to millions of documents, necessitating a three-stage process for inference: retrieving initial candidates via token retrieval, accessing all token vectors, and scoring the initial candidate documents. The non-linear scoring function is applied over all token vectors of each candidate document, making the inference process complicated and slow. In this paper, we aim to simplify the multi-vector retrieval by rethinking the role of token retrieval. We present XTR, ConteXtualized Token Retriever, which introduces a simple, yet novel, objective function that encourages the model to retrieve the most important document tokens first. The improvement to token retrieval allows XTR to rank candidates only using the retrieved tokens rather than all tokens in the document, and enables a newly designed scoring stage that is two-to-three orders of magnitude cheaper than that of ColBERT. On the popular BEIR benchmark, XTR advances the state-of-the-art by 2.8 nDCG@10 without any distillation. Detailed analysis confirms our decision to revisit the token retrieval stage, as XTR demonstrates much better recall of the token retrieval stage compared to ColBERT.
Paper Structure (37 sections, 11 equations, 8 figures, 12 tables)

This paper contains 37 sections, 11 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Overview of XTR. ColBERT has the three-stage inference combining (a) the token retrieval, (b) the gathering and (c) the scoring stages (\ref{['sec:three-stage']}). XTR leverages the token retrieval for both training and inference. XTR efficiently obtains the score of each candidate document by applying $f_\text{XTR}$ (or $f_{\text{XTR}^\prime}$) on the retrieved tokens, completely removing the gathering stage (\ref{['sec:approx']}).
  • Figure 2: Density histogram of 4,000 token retrieval scores (cosine similarity). Training with $f_\text{ColBERT}$ (T5-ColBERT; \ref{['sec:exp-setting']}) causes many document tokens to have extremely high scores regardless of their actual relevance with respect to the input query tokens. XTR mitigates this problem with a better training objective.
  • Figure 3: Comparison of $f_\text{ColBERT}$ in \ref{['eq:som']} and $f_{\text{XTR}^\prime}$ in \ref{['eq:som-approx']}. Assume that $D_a$ and $D_b$ were selected as initial candidate documents from the token retrieval stage. $f_\text{ColBERT}$ loads all token vectors of $D_a$ and $D_b$ and exhaustively recomputes pairwise token similarity to obtain the max values (red boxes). On the other hand, $f_{\text{XTR}^\prime}$ does not load any token vectors and reuses retrieval scores from the first-stage token retrieval. Assume that, with the top-2 token retrieval results, the first query token retrieved each max score of $D_a$ and $D_b$, but the second query token retrieved two tokens only from $D_a$ but not $D_b$. We impute the missing similarity $m$ for $D_b$ (denoted as yellow dashed box) by finding its upper bound using the top-2 score (denoted as $s_2$) of the second query token (i.e., $m \leq s_2 \leq s_1$).
  • Figure 4: (top) Gold token retrieval performances of T5-ColBERT and XTR. We plot the probability of each retrieved document token at rank $k$ coming from the gold document. (bottom) Lexical token retrieval performances of T5-ColBERT and XTR. We plot the probability of each retrieved document token at rank $k$ being lexically identical to its query token.
  • Figure 5: Impact of training objectives and imputation methods comparing T5-ColBERT and XTR. For both models, we apply $f_{\text{XTR}^\prime}$ during inference. We report MRR@10 and Recall@1000 on the MS MARCO development set.
  • ...and 3 more figures