Table of Contents
Fetching ...

LongEval at CLEF 2025: Longitudinal Evaluation of IR Model Performance

Matteo Cancellieri, Alaa El-Ebshihy, Tobias Fink, Petra Galuščáková, Gabriela Gonzalez-Saez, Lorraine Goeuriot, David Iommi, Jüri Keller, Petr Knoth, Philippe Mulhem, Florina Piroi, David Pride, Philipp Schaer

TL;DR

The paper presents the third LongEval edition at CLEF 2025, focusing on temporal persistence in IR by evaluating how ranking quality degrades as test data diverge in time. It introduces two tasks—LongEval-WebRetrieval and LongEval-SciRetrieval—built on time-structured snapshots to measure robustness with metrics such as $nDCG$ and Relative $nDCG$ Drop ($RnD$). The dataset expands training data to about 30 million documents and 15,000 queries, incorporating $sDBN$-based relevance signals for web data and CORE-based click logs for scholarly retrieval, with multiple test lags planned. The timeline outlines data releases in early 2025, a March–May submission window, and a CLEF 2025 workshop to foster broader engagement in temporal IR research.

Abstract

This paper presents the third edition of the LongEval Lab, part of the CLEF 2025 conference, which continues to explore the challenges of temporal persistence in Information Retrieval (IR). The lab features two tasks designed to provide researchers with test data that reflect the evolving nature of user queries and document relevance over time. By evaluating how model performance degrades as test data diverge temporally from training data, LongEval seeks to advance the understanding of temporal dynamics in IR systems. The 2025 edition aims to engage the IR and NLP communities in addressing the development of adaptive models that can maintain retrieval quality over time in the domains of web search and scientific retrieval.

LongEval at CLEF 2025: Longitudinal Evaluation of IR Model Performance

TL;DR

The paper presents the third LongEval edition at CLEF 2025, focusing on temporal persistence in IR by evaluating how ranking quality degrades as test data diverge in time. It introduces two tasks—LongEval-WebRetrieval and LongEval-SciRetrieval—built on time-structured snapshots to measure robustness with metrics such as and Relative Drop (). The dataset expands training data to about 30 million documents and 15,000 queries, incorporating -based relevance signals for web data and CORE-based click logs for scholarly retrieval, with multiple test lags planned. The timeline outlines data releases in early 2025, a March–May submission window, and a CLEF 2025 workshop to foster broader engagement in temporal IR research.

Abstract

This paper presents the third edition of the LongEval Lab, part of the CLEF 2025 conference, which continues to explore the challenges of temporal persistence in Information Retrieval (IR). The lab features two tasks designed to provide researchers with test data that reflect the evolving nature of user queries and document relevance over time. By evaluating how model performance degrades as test data diverge temporally from training data, LongEval seeks to advance the understanding of temporal dynamics in IR systems. The 2025 edition aims to engage the IR and NLP communities in addressing the development of adaptive models that can maintain retrieval quality over time in the domains of web search and scientific retrieval.

Paper Structure

This paper contains 8 sections, 1 figure.

Figures (1)

  • Figure 1: Sample result from CORE for a search for "open science"