PeerQA: A Scientific Question Answering Dataset from Peer Reviews
Tim Baumgärtner, Ted Briscoe, Iryna Gurevych
TL;DR
PeerQA introduces a real-world, document-level QA dataset for science by sourcing questions from peer reviews and collecting author-provided answers. The dataset supports three practical tasks—evidence retrieval, answerability classification, and free-form answer generation—and includes 579 labeled QA pairs from 208 papers plus 12k unlabeled questions. Analyses show that decontextualization improves retrieval at the paragraph level and that long-context papers (~12k tokens) pose challenges for generation, though retrieval-augmented generation with top passages often outperforms full-document context. The work provides baselines, a comprehensive analysis, and open-source code and data to drive future research in long-context scientific QA.
Abstract
We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset. PeerQA questions have been sourced from peer reviews, which contain questions that reviewers raised while thoroughly examining the scientific article. Answers have been annotated by the original authors of each paper. The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP, as well as a subset of other scientific communities like Geoscience and Public Health. PeerQA supports three critical tasks for developing practical QA systems: Evidence retrieval, unanswerable question classification, and answer generation. We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks. Our experiments and analyses reveal the need for decontextualization in document-level retrieval, where we find that even simple decontextualization approaches consistently improve retrieval performance across architectures. On answer generation, PeerQA serves as a challenging benchmark for long-context modeling, as the papers have an average size of 12k tokens. Our code and data is available at https://github.com/UKPLab/peerqa.
