Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever
Rohan Jha, Bo Wang, Michael Günther, Georgios Mastrapas, Saba Sturua, Isabelle Mohr, Andreas Koukounas, Mohammad Kalim Akram, Nan Wang, Han Xiao
TL;DR
Jina-ColBERT-v2 advances multilingual dense retrieval by combining a RoBERTa-based backbone with rotary positional embeddings, flash attention, and a scalable late-interaction ColBERT framework. It uses a three-stage training regime—weakly supervised pair training, triplet fine-tuning with hard negatives, and cross-encoder distillation—coupled with a family of projection heads learned via Matryoshka Representation Loss to enable inference-time size selection. Trained on diverse multilingual data, it achieves competitive English BEIR performance and strong multilingual results on MIRACL and mMARCO, with efficiency gains from smaller embedding dimensions and flexible head selection. Ablation studies highlight the impact of architectural choices and query augmentation, pointing to promising directions for integrating inference optimizations with training in future work.
Abstract
Multi-vector dense models, such as ColBERT, have proven highly effective in information retrieval. ColBERT's late interaction scoring approximates the joint query-document attention seen in cross-encoders while maintaining inference efficiency closer to traditional dense retrieval models, thanks to its bi-encoder architecture and recent optimizations in indexing and search. In this work we propose a number of incremental improvements to the ColBERT model architecture and training pipeline, using methods shown to work in the more mature single-vector embedding model training paradigm, particularly those that apply to heterogeneous multilingual data or boost efficiency with little tradeoff. Our new model, Jina-ColBERT-v2, demonstrates strong performance across a range of English and multilingual retrieval tasks.
