Overview of the TREC 2025 RAGTIME Track
Dawn Lawrie, Sean MacAvaney, James Mayfield, Luca Soldaini, Eugene Yang, Andrew Yates
TL;DR
This paper presents the inaugural RAGTIME track at TREC, focusing on retrieval-augmented, multilingual long-form report generation across Arabic, Chinese, English, and Russian. It defines three tasks—Multilingual Report Generation, Monolingual English Report Generation, and Multilingual Information Retrieval—along with a shared document collection and evaluation framework inspired by prior work on evaluation and ARGUE-based assessment. The assessment workflow comprises four phases (document relevance, nugget creation, citation assessment, nugget matching) complemented by automatic evaluation using AutoARGUE and a dedicated retrieval service; development data from NeuCLIR 2024 supports task calibration. Results from 13 teams and 125 runs reveal strong sentence grounding but relatively lower nugget coverage, underscoring the influential role of retrieval and the need for improved nugget capture, with planned expansion in 2026 to include an Autonuggetization task. Overall, RAGTIME establishes a foundation for multilingual RAG research and reusable evaluation resources, signaling continued development and broader adoption in future years.
Abstract
The principal goal of the RAG TREC Instrument for Multilingual Evaluation (RAGTIME) track at TREC is to study report generation from multilingual source documents. The track has created a document collection containing Arabic, Chinese, English, and Russian news stories. RAGTIME includes three task types: Multilingual Report Generation, English Report Generation, and Multilingual Information Retrieval (MLIR). A total of 125 runs were submitted by 13 participating teams (and as baselines by the track coordinators) for three tasks. This overview describes these three tasks and presents the available results.
