Reading and Reasoning over Chart Images for Evidence-based Automated Fact-Checking

Mubashara Akhtar; Oana Cocarascu; Elena Simperl

Reading and Reasoning over Chart Images for Evidence-based Automated Fact-Checking

Mubashara Akhtar, Oana Cocarascu, Elena Simperl

TL;DR

The paper defines chart-based fact-checking and presents ChartBERT, a reading-generation-embedding pipeline that fuses OCR-derived chart text with structural cues to verify claims against chart evidence. It introduces ChartFC, a 15,886-sample dataset derived from TabFact to benchmark chart-based evidence verification and systematically evaluates 75 vision-language baselines, with ChartBERT achieving 63.8% accuracy. The work demonstrates feasibility but also highlights substantial challenges in numerical reasoning and chart variability, underscoring the need for further multimodal and chart-specific reasoning research. Overall, it provides a new task, a first chart-focused AFC model, and a large benchmark to spur progress in evidence-based verification using chart imagery.

Abstract

Evidence data for automated fact-checking (AFC) can be in multiple modalities such as text, tables, images, audio, or video. While there is increasing interest in using images for AFC, previous works mostly focus on detecting manipulated or fake images. We propose a novel task, chart-based fact-checking, and introduce ChartBERT as the first model for AFC against chart evidence. ChartBERT leverages textual, structural and visual information of charts to determine the veracity of textual claims. For evaluation, we create ChartFC, a new dataset of 15, 886 charts. We systematically evaluate 75 different vision-language (VL) baselines and show that ChartBERT outperforms VL models, achieving 63.8% accuracy. Our results suggest that the task is complex yet feasible, with many challenges ahead.

Reading and Reasoning over Chart Images for Evidence-based Automated Fact-Checking

TL;DR

Abstract

Paper Structure (23 sections, 10 equations, 8 figures, 7 tables)

This paper contains 23 sections, 10 equations, 8 figures, 7 tables.

Introduction
Related Work
Verifying Claims against Evidence
Automated Fact-Checking with Images
Chart Images in Other NLP Tasks
ChartBERT Model
Task Formulation
Reading Text from Chart Images
Text Sequence Generation
Encoding and Classification
Evaluation
ChartFC Dataset
The TabFact Dataset
Creation Pipeline
Dataset Evaluation
...and 8 more sections

Figures (8)

Figure 1: An example from the ChartFC dataset where the claim is supported by the evidence chart.
Figure 2: The ChartBERT architecture.
Figure 3: ChartBERT input representation with the text extracted from the chart and concatenated following the approach in Section \ref{['ssec:sequence_gen']}. We include additional structural embeddings (i.e. x and y coordinates and label embeddings) to the BERT input embeddings (i.e. token, segment and position embeddings).
Figure 4: Dataset creation process.
Figure 5: Number of chart reasoning types found in $100$ dataset entries.
...and 3 more figures

Reading and Reasoning over Chart Images for Evidence-based Automated Fact-Checking

TL;DR

Abstract

Reading and Reasoning over Chart Images for Evidence-based Automated Fact-Checking

Authors

TL;DR

Abstract

Table of Contents

Figures (8)