jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking
Feng Wang, Yuqing Li, Han Xiao
TL;DR
Jina-reranker-v3 tackles the efficiency–effectiveness bottleneck in document reranking by introducing last but not late interaction (LBNL), a listwise approach that performs cross-document attention during encoding within shared context windows. Built on a 0.6B parameter Qwen3 backbone, it extracts contextual embeddings at designated token positions and uses a lightweight projector with cosine similarity to produce relevance scores, enabling end-to-end training with multiple complementary losses. The three-stage training regimen (foundation LoRA fine-tuning, context/hard negative mining, and ensemble optimization) coupled with strong multilingual data enables state-of-the-art BEIR performance ($nDCG@10 \n$ around $61.9$) and competitive results across MIRACL, MKQA, and CoIR, while maintaining parameter efficiency relative to larger models. This approach bridges efficiency and expressive power by enabling cross-document signals during encoding, with practical impact for large-scale, multilingual retrieval tasks and code search; future work includes robustness to prompt injections and deduplication via submodularity.
Abstract
jina-reranker-v3 is a 0.6B-parameter multilingual listwise reranker that introduces a novel "last but not late" interaction. Unlike late interaction models like ColBERT that encode documents separately before multi-vector matching, our approach applies causal attention between the query and all candidate documents in the same context window, enabling rich interactions before extracting contextual embeddings from each document's final token. The new model achieves state-of-the-art BEIR performance with 61.94 nDCG@10 while being significantly smaller than other models with comparable performance.
