ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

Quentin Macé; António Loison; Manuel Faysse

ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

Quentin Macé, António Loison, Manuel Faysse

TL;DR

ViDoRe Benchmark V2 addresses saturation in prior visual retrieval benchmarks by introducing challenging, realistic retrieval scenarios including blind contextual querying, long-form and cross-document queries, and a hybrid query-generation pipeline across four multilingual datasets. The approach emphasizes reducing extractive bias and broadening evaluation to multilingual and cross-document contexts, with BeIR-compatible tooling and plans to evolve as a living benchmark. Key findings show substantial headroom for advancement, especially in non-English generalization and cross-domain tasks, and indicate that larger models offer performance gains at higher computational cost, while human-labeled data provides more discriminative signals. The benchmark is positioned to impact real-world visual retrieval research by enabling community-driven dataset growth and ongoing method development.

Abstract

The ViDoRe Benchmark V1 was approaching saturation with top models exceeding 90% nDCG@5, limiting its ability to discern improvements. ViDoRe Benchmark V2 introduces realistic, challenging retrieval scenarios via blind contextual querying, long and cross-document queries, and a hybrid synthetic and human-in-the-loop query generation process. It comprises four diverse, multilingual datasets and provides clear evaluation instructions. Initial results demonstrate substantial room for advancement and highlight insights on model generalization and multilingual capability. This benchmark is designed as a living resource, inviting community contributions to maintain relevance through future evaluations.

ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

TL;DR

Abstract

ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)