Table of Contents
Fetching ...

jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking

Feng Wang, Yuqing Li, Han Xiao

TL;DR

Jina-reranker-v3 tackles the efficiency–effectiveness bottleneck in document reranking by introducing last but not late interaction (LBNL), a listwise approach that performs cross-document attention during encoding within shared context windows. Built on a 0.6B parameter Qwen3 backbone, it extracts contextual embeddings at designated token positions and uses a lightweight projector with cosine similarity to produce relevance scores, enabling end-to-end training with multiple complementary losses. The three-stage training regimen (foundation LoRA fine-tuning, context/hard negative mining, and ensemble optimization) coupled with strong multilingual data enables state-of-the-art BEIR performance ($nDCG@10 \n$ around $61.9$) and competitive results across MIRACL, MKQA, and CoIR, while maintaining parameter efficiency relative to larger models. This approach bridges efficiency and expressive power by enabling cross-document signals during encoding, with practical impact for large-scale, multilingual retrieval tasks and code search; future work includes robustness to prompt injections and deduplication via submodularity.

Abstract

jina-reranker-v3 is a 0.6B-parameter multilingual listwise reranker that introduces a novel "last but not late" interaction. Unlike late interaction models like ColBERT that encode documents separately before multi-vector matching, our approach applies causal attention between the query and all candidate documents in the same context window, enabling rich interactions before extracting contextual embeddings from each document's final token. The new model achieves state-of-the-art BEIR performance with 61.94 nDCG@10 while being significantly smaller than other models with comparable performance.

jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking

TL;DR

Jina-reranker-v3 tackles the efficiency–effectiveness bottleneck in document reranking by introducing last but not late interaction (LBNL), a listwise approach that performs cross-document attention during encoding within shared context windows. Built on a 0.6B parameter Qwen3 backbone, it extracts contextual embeddings at designated token positions and uses a lightweight projector with cosine similarity to produce relevance scores, enabling end-to-end training with multiple complementary losses. The three-stage training regimen (foundation LoRA fine-tuning, context/hard negative mining, and ensemble optimization) coupled with strong multilingual data enables state-of-the-art BEIR performance ( around ) and competitive results across MIRACL, MKQA, and CoIR, while maintaining parameter efficiency relative to larger models. This approach bridges efficiency and expressive power by enabling cross-document signals during encoding, with practical impact for large-scale, multilingual retrieval tasks and code search; future work includes robustness to prompt injections and deduplication via submodularity.

Abstract

jina-reranker-v3 is a 0.6B-parameter multilingual listwise reranker that introduces a novel "last but not late" interaction. Unlike late interaction models like ColBERT that encode documents separately before multi-vector matching, our approach applies causal attention between the query and all candidate documents in the same context window, enabling rich interactions before extracting contextual embeddings from each document's final token. The new model achieves state-of-the-art BEIR performance with 61.94 nDCG@10 while being significantly smaller than other models with comparable performance.

Paper Structure

This paper contains 15 sections, 3 equations, 1 figure, 6 tables.

Figures (1)

  • Figure 1: Architecture of https://huggingface.co/jinaai/jina-reranker-v3 showing the transformer backbone with special token positions for embedding extraction. The model processes multiple documents and query in one context window, extracting contextual embeddings at designated token positions for similarity computation.