Table of Contents
Fetching ...

How Grounded is Wikipedia? A Study on Structured Evidential Support and Retrieval

William Walden, Kathryn Ricci, Miriam Wanner, Zhengping Jiang, Chandler May, Rongkun Zhou, Benjamin Van Durme

TL;DR

This work investigates how well Wikipedia claims are grounded, distinguishing lead versus body content and internal versus external grounding. It introduces PeopleProfiles, a large-scale dataset of fine-grained, multi-level evidential annotations that connect lead claims to body evidence and body claims to external sources, with scalar support labels and automated GPT-4-based annotation validated against humans. Key findings include that about 22% of lead claims lack body grounding and around 30% of body claims lack scrappable source grounding, highlighting gaps in real-world citation practices; retrieval of complex, multi-premise evidence remains challenging even for modern reasoning-based rerankers. The study demonstrates that sophisticated reasoning-based rerankers substantially improve evidence retrieval over traditional methods and releases the dataset to accelerate research in claim verification and Wikipedia-driven NLP applications.

Abstract

Wikipedia is a critical resource for modern NLP, serving as a rich repository of up-to-date and citation-backed information on a wide variety of subjects. The reliability of Wikipedia -- its groundedness in its cited sources -- is vital to this purpose. This work analyzes both how grounded Wikipedia is and how readily fine-grained grounding evidence can be retrieved. To this end, we introduce PeopleProfiles -- a large-scale, multi-level dataset of claim support annotations on biographical Wikipedia articles. We show that: (1) ~22% of claims in Wikipedia lead sections are unsupported by the article body; (2) ~30% of claims in the article body are unsupported by their publicly accessible sources; and (3) real-world Wikipedia citation practices often differ from documented standards. Finally, we show that complex evidence retrieval remains a challenge -- even for recent reasoning rerankers.

How Grounded is Wikipedia? A Study on Structured Evidential Support and Retrieval

TL;DR

This work investigates how well Wikipedia claims are grounded, distinguishing lead versus body content and internal versus external grounding. It introduces PeopleProfiles, a large-scale dataset of fine-grained, multi-level evidential annotations that connect lead claims to body evidence and body claims to external sources, with scalar support labels and automated GPT-4-based annotation validated against humans. Key findings include that about 22% of lead claims lack body grounding and around 30% of body claims lack scrappable source grounding, highlighting gaps in real-world citation practices; retrieval of complex, multi-premise evidence remains challenging even for modern reasoning-based rerankers. The study demonstrates that sophisticated reasoning-based rerankers substantially improve evidence retrieval over traditional methods and releases the dataset to accelerate research in claim verification and Wikipedia-driven NLP applications.

Abstract

Wikipedia is a critical resource for modern NLP, serving as a rich repository of up-to-date and citation-backed information on a wide variety of subjects. The reliability of Wikipedia -- its groundedness in its cited sources -- is vital to this purpose. This work analyzes both how grounded Wikipedia is and how readily fine-grained grounding evidence can be retrieved. To this end, we introduce PeopleProfiles -- a large-scale, multi-level dataset of claim support annotations on biographical Wikipedia articles. We show that: (1) ~22% of claims in Wikipedia lead sections are unsupported by the article body; (2) ~30% of claims in the article body are unsupported by their publicly accessible sources; and (3) real-world Wikipedia citation practices often differ from documented standards. Finally, we show that complex evidence retrieval remains a challenge -- even for recent reasoning rerankers.

Paper Structure

This paper contains 34 sections, 1 equation, 9 figures, 8 tables.

Figures (9)

  • Figure 1: PeopleProfiles features fine-grained, multi-level evidential relations and scalar support labels (not shown) on Wikipedia articles---from cited sources to claims in the article body (bottom arrow), and from the body to claims in the article lead (top arrow).
  • Figure 2: A more detailed view of the multi-level structure of PeopleProfiles annotations. Claims in the lead of a Wikipedia article (top left) are supported by sentences in the body (bottom left), whose claims in turn are supported by evidence in cited sources (right). Prior work on Wikipedia claim verification has not considered this structure.
  • Figure 3: Kernel density estimation plots for Wikipedia lead/body claim support in the PeopleProfiles dev split. We find that many claims are not fully grounded.
  • Figure 4: Distribution of overall evidence scores for PeopleProfiles dev split body evidence with mean- (blue) and product-based (orange) aggregation of body claim support scores for each evidence sentence.
  • Figure 5: Annotation interface for the human pilot annotation. Detailed description can be found in Appendix \ref{['app:annotation::claim-decomposition']}.
  • ...and 4 more figures