QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
Anna Rogers, Matt Gardner, Isabelle Augenstein
TL;DR
This survey provides the most comprehensive catalog of QA and RC datasets to date, organizing hundreds of resources through a multi-dimensional taxonomy that separates question format, answer form, evidence modality, conversational dynamics, domain, and language. It advances a novel, orthogonal taxonomy of QA/RC skills (inference type, retrieval, input interpretation, world modeling, multi-step reasoning) to better capture the diverse reasoning demands across datasets, and highlights the gap between dataset design and true human-like reasoning. The authors emphasize the overemphasis on English-language data, the rising importance of multimodal and conversational QA, and the need for robust evaluation of whether models reason about content or merely exploit dataset artifacts. They call for future data development across underrepresented domains and languages, better annotation practices, and principled analyses to enable safer deployment and broader real-world impact.
Abstract
Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of "skills" that question answering/reading comprehension systems are supposed to acquire, and propose a new taxonomy. The supplementary materials survey the current multilingual resources and monolingual resources for languages other than English, and we discuss the implications of over-focusing on English. The study is aimed at both practitioners looking for pointers to the wealth of existing data, and at researchers working on new resources.
