ACL-rlg: A Dataset for Reading List Generation
Julien Aubert-Béduchaud, Florian Boudin, Béatrice Daille, Richard Dufour
TL;DR
The paper tackles the problem of efficiently familiarizing researchers with a field by introducing ACL-rlg, the largest expert-curated reading list dataset derived from ACL tutorials. It formalizes reading-list generation as an ordered retrieval task, defines evaluation benchmarks, and provides baselines using search engines and retrieval models as well as large language models. Findings show that while GPT-4o offers the best performance among tested systems, overall results are still limited and raise concerns about data contamination and hallucinations, underscoring the need for retrieval-augmented approaches and ordering-aware evaluation. The work contributes a valuable benchmark, releases code and data under MIT, and points toward future improvements in reading-list generation and evaluation metrics.
Abstract
Familiarizing oneself with a new scientific field and its existing literature can be daunting due to the large amount of available articles. Curated lists of academic references, or reading lists, compiled by experts, offer a structured way to gain a comprehensive overview of a domain or a specific scientific challenge. In this work, we introduce ACL-rlg, the largest open expert-annotated reading list dataset. We also provide multiple baselines for evaluating reading list generation and formally define it as a retrieval task. Our qualitative study highlights the fact that traditional scholarly search engines and indexing methods perform poorly on this task, and GPT-4o, despite showing better results, exhibits signs of potential data contamination.
