How Grounded is Wikipedia? A Study on Structured Evidential Support and Retrieval
William Walden, Kathryn Ricci, Miriam Wanner, Zhengping Jiang, Chandler May, Rongkun Zhou, Benjamin Van Durme
TL;DR
This work investigates how well Wikipedia claims are grounded, distinguishing lead versus body content and internal versus external grounding. It introduces PeopleProfiles, a large-scale dataset of fine-grained, multi-level evidential annotations that connect lead claims to body evidence and body claims to external sources, with scalar support labels and automated GPT-4-based annotation validated against humans. Key findings include that about 22% of lead claims lack body grounding and around 30% of body claims lack scrappable source grounding, highlighting gaps in real-world citation practices; retrieval of complex, multi-premise evidence remains challenging even for modern reasoning-based rerankers. The study demonstrates that sophisticated reasoning-based rerankers substantially improve evidence retrieval over traditional methods and releases the dataset to accelerate research in claim verification and Wikipedia-driven NLP applications.
Abstract
Wikipedia is a critical resource for modern NLP, serving as a rich repository of up-to-date and citation-backed information on a wide variety of subjects. The reliability of Wikipedia -- its groundedness in its cited sources -- is vital to this purpose. This work analyzes both how grounded Wikipedia is and how readily fine-grained grounding evidence can be retrieved. To this end, we introduce PeopleProfiles -- a large-scale, multi-level dataset of claim support annotations on biographical Wikipedia articles. We show that: (1) ~22% of claims in Wikipedia lead sections are unsupported by the article body; (2) ~30% of claims in the article body are unsupported by their publicly accessible sources; and (3) real-world Wikipedia citation practices often differ from documented standards. Finally, we show that complex evidence retrieval remains a challenge -- even for recent reasoning rerankers.
