Table of Contents
Fetching ...

HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking

Juraj Vladika, Phillip Schneider, Florian Matthes

TL;DR

It is shown that the dataset HealthFC, which consists of 750 health-related claims in German and English, labeled for veracity by medical experts and backed with evidence from systematic reviews and clinical trials, is a challenging test bed with a high potential for future use.

Abstract

In the digital age, seeking health advice on the Internet has become a common practice. At the same time, determining the trustworthiness of online medical content is increasingly challenging. Fact-checking has emerged as an approach to assess the veracity of factual claims using evidence from credible knowledge sources. To help advance automated Natural Language Processing (NLP) solutions for this task, in this paper we introduce a novel dataset HealthFC. It consists of 750 health-related claims in German and English, labeled for veracity by medical experts and backed with evidence from systematic reviews and clinical trials. We provide an analysis of the dataset, highlighting its characteristics and challenges. The dataset can be used for NLP tasks related to automated fact-checking, such as evidence retrieval, claim verification, or explanation generation. For testing purposes, we provide baseline systems based on different approaches, examine their performance, and discuss the findings. We show that the dataset is a challenging test bed with a high potential for future use.

HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking

TL;DR

It is shown that the dataset HealthFC, which consists of 750 health-related claims in German and English, labeled for veracity by medical experts and backed with evidence from systematic reviews and clinical trials, is a challenging test bed with a high potential for future use.

Abstract

In the digital age, seeking health advice on the Internet has become a common practice. At the same time, determining the trustworthiness of online medical content is increasingly challenging. Fact-checking has emerged as an approach to assess the veracity of factual claims using evidence from credible knowledge sources. To help advance automated Natural Language Processing (NLP) solutions for this task, in this paper we introduce a novel dataset HealthFC. It consists of 750 health-related claims in German and English, labeled for veracity by medical experts and backed with evidence from systematic reviews and clinical trials. We provide an analysis of the dataset, highlighting its characteristics and challenges. The dataset can be used for NLP tasks related to automated fact-checking, such as evidence retrieval, claim verification, or explanation generation. For testing purposes, we provide baseline systems based on different approaches, examine their performance, and discuss the findings. We show that the dataset is a challenging test bed with a high potential for future use.
Paper Structure (24 sections, 3 figures, 5 tables)

This paper contains 24 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Number of collected health fact-check articles by year of publication.
  • Figure 2: Distribution of the top ten most popular health topics in the collected dataset.
  • Figure 3: Evidence level count by verdict label. NEI denotes "not enough information".