ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models
Antoine Chaffin, Luca Arnaboldi, Amélie Chatelain, Florent Krzakala
TL;DR
This work evaluates whether pre-training ColBERT multi-vector models is advantageous over relying solely on knowledge distillation on top of a dense model. It demonstrates that a fully ColBERT-pre-trained model, ColBERT-Zero, trained on public data, can outperform state-of-the-art baselines that use stronger but closed data, establishing new performance standards for its size. A supervised contrastive phase before KD can closely approximate full pre-training at a fraction of the cost, offering a practical alternative when large-scale unsupervised pre-training is prohibitive. The study also highlights the critical role of aligning pre-training and fine-tuning setups, particularly regarding prompts, and provides public checkpoints and code to facilitate further exploration of multi-vector pre-training techniques.
Abstract
Current state-of-the-art multi-vector models are obtained through a small Knowledge Distillation (KD) training step on top of strong single-vector models, leveraging the large-scale pre-training of these models. In this paper, we study the pre-training of multi-vector models and show that large-scale multi-vector pre-training yields much stronger multi-vector models. Notably, a fully ColBERT-pre-trained model, ColBERT-Zero, trained only on public data, outperforms GTE-ModernColBERT as well as its base model, GTE-ModernBERT, which leverages closed and much stronger data, setting new state-of-the-art for model this size. We also find that, although performing only a small KD step is not enough to achieve results close to full pre-training, adding a supervised step beforehand allows to achieve much closer performance while skipping the most costly unsupervised phase. Finally, we find that aligning the fine-tuning and pre-training setups is crucial when repurposing existing models. To enable exploration of our results, we release various checkpoints as well as code used to train them.
