Table of Contents
Fetching ...

Fact or Fiction: Verifying Scientific Claims

David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, Hannaneh Hajishirzi

TL;DR

The paper formalizes scientific claim verification and introduces SciFact, a dataset of 1.4k expert-authored claims with evidence-bearing abstracts and rationales. It presents VeriSci, a three-component baseline system for evidence retrieval, rationale selection, and label prediction, and evaluates it in open and oracle settings. The authors show pretraining on non-domain data helps claim verification, while domain-specific data improves rationale extraction; COVID-19 verification demonstrates real-world applicability. They release data and code to spur future research in evidence-grounded scientific NLP.

Abstract

We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that SUPPORTS or REFUTES a given scientific claim, and to identify rationales justifying each decision. To study this task, we construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales. We develop baseline models for SciFact, and demonstrate that simple domain adaptation techniques substantially improve performance compared to models trained on Wikipedia or political news. We show that our system is able to verify claims related to COVID-19 by identifying evidence from the CORD-19 corpus. Our experiments indicate that SciFact will provide a challenging testbed for the development of new systems designed to retrieve and reason over corpora containing specialized domain knowledge. Data and code for this new task are publicly available at https://github.com/allenai/scifact. A leaderboard and COVID-19 fact-checking demo are available at https://scifact.apps.allenai.org.

Fact or Fiction: Verifying Scientific Claims

TL;DR

The paper formalizes scientific claim verification and introduces SciFact, a dataset of 1.4k expert-authored claims with evidence-bearing abstracts and rationales. It presents VeriSci, a three-component baseline system for evidence retrieval, rationale selection, and label prediction, and evaluates it in open and oracle settings. The authors show pretraining on non-domain data helps claim verification, while domain-specific data improves rationale extraction; COVID-19 verification demonstrates real-world applicability. They release data and code to spur future research in evidence-grounded scientific NLP.

Abstract

We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that SUPPORTS or REFUTES a given scientific claim, and to identify rationales justifying each decision. To study this task, we construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales. We develop baseline models for SciFact, and demonstrate that simple domain adaptation techniques substantially improve performance compared to models trained on Wikipedia or political news. We show that our system is able to verify claims related to COVID-19 by identifying evidence from the CORD-19 corpus. Our experiments indicate that SciFact will provide a challenging testbed for the development of new systems designed to retrieve and reason over corpora containing specialized domain knowledge. Data and code for this new task are publicly available at https://github.com/allenai/scifact. A leaderboard and COVID-19 fact-checking demo are available at https://scifact.apps.allenai.org.

Paper Structure

This paper contains 34 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: A scientific claim, supported by evidence identified by our system. To correctly verify this claim, the system must possess background knowledge that troponin is a protein found in cardiac muscle and that elevated levels of troponin are a marker of cardiac injury. In addition, it must be able to reason about directional relationships between scientific processes: replacing higher with lower would cause the rationale to Refute the claim rather than Support it. Finally, the system should interpret $p < 0.001$ as an indication that the reported finding is statistically significant.
  • Figure 2: Corpus construction. Citing abstracts are identified for each seed document. A claim is written based on the source citance in the citing abstract.
  • Figure 3: Most frequently occurring Medical Subject Headings (MeSH) terms (y-axis) among cited abstracts. MeSH is a controlled vocabulary used for indexing articles in PubMed. Topics range from clinical trial reports ("Humans", "Risk Factors") to molecular biology ("Cell Line", "RNA").
  • Figure 4: A claim written based on a citance. Material unrelated to the citation is removed. The acronym "CVD" is expanded to "cardiovascular disease".
  • Figure 5: A claim supported by two rationales from the same abstract. The text of each rationale on its own provides sufficient evidence to verify the claim.
  • ...and 2 more figures