HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking

Juraj Vladika; Phillip Schneider; Florian Matthes

HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking

Juraj Vladika, Phillip Schneider, Florian Matthes

TL;DR

It is shown that the dataset HealthFC, which consists of 750 health-related claims in German and English, labeled for veracity by medical experts and backed with evidence from systematic reviews and clinical trials, is a challenging test bed with a high potential for future use.

Abstract

In the digital age, seeking health advice on the Internet has become a common practice. At the same time, determining the trustworthiness of online medical content is increasingly challenging. Fact-checking has emerged as an approach to assess the veracity of factual claims using evidence from credible knowledge sources. To help advance automated Natural Language Processing (NLP) solutions for this task, in this paper we introduce a novel dataset HealthFC. It consists of 750 health-related claims in German and English, labeled for veracity by medical experts and backed with evidence from systematic reviews and clinical trials. We provide an analysis of the dataset, highlighting its characteristics and challenges. The dataset can be used for NLP tasks related to automated fact-checking, such as evidence retrieval, claim verification, or explanation generation. For testing purposes, we provide baseline systems based on different approaches, examine their performance, and discuss the findings. We show that the dataset is a challenging test bed with a high potential for future use.

HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking

TL;DR

Abstract

Paper Structure (24 sections, 3 figures, 5 tables)

This paper contains 24 sections, 3 figures, 5 tables.

Introduction
Related Work
Medical NLP Tasks
Medical Fact-Checking
Dataset Construction
Data Source
Claims and Labels
Evidence Annotation
Dataset Description
General Overview of Dataset
Descriptive Statistics of Dataset
Baselines
Problem Statement
Pipeline Systems
Joint Systems
...and 9 more sections

Figures (3)

Figure 1: Number of collected health fact-check articles by year of publication.
Figure 2: Distribution of the top ten most popular health topics in the collected dataset.
Figure 3: Evidence level count by verdict label. NEI denotes "not enough information".

HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking

TL;DR

Abstract

HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking

Authors

TL;DR

Abstract

Table of Contents

Figures (3)