Table of Contents
Fetching ...

Reading and Reasoning over Chart Images for Evidence-based Automated Fact-Checking

Mubashara Akhtar, Oana Cocarascu, Elena Simperl

TL;DR

The paper defines chart-based fact-checking and presents ChartBERT, a reading-generation-embedding pipeline that fuses OCR-derived chart text with structural cues to verify claims against chart evidence. It introduces ChartFC, a 15,886-sample dataset derived from TabFact to benchmark chart-based evidence verification and systematically evaluates 75 vision-language baselines, with ChartBERT achieving 63.8% accuracy. The work demonstrates feasibility but also highlights substantial challenges in numerical reasoning and chart variability, underscoring the need for further multimodal and chart-specific reasoning research. Overall, it provides a new task, a first chart-focused AFC model, and a large benchmark to spur progress in evidence-based verification using chart imagery.

Abstract

Evidence data for automated fact-checking (AFC) can be in multiple modalities such as text, tables, images, audio, or video. While there is increasing interest in using images for AFC, previous works mostly focus on detecting manipulated or fake images. We propose a novel task, chart-based fact-checking, and introduce ChartBERT as the first model for AFC against chart evidence. ChartBERT leverages textual, structural and visual information of charts to determine the veracity of textual claims. For evaluation, we create ChartFC, a new dataset of 15, 886 charts. We systematically evaluate 75 different vision-language (VL) baselines and show that ChartBERT outperforms VL models, achieving 63.8% accuracy. Our results suggest that the task is complex yet feasible, with many challenges ahead.

Reading and Reasoning over Chart Images for Evidence-based Automated Fact-Checking

TL;DR

The paper defines chart-based fact-checking and presents ChartBERT, a reading-generation-embedding pipeline that fuses OCR-derived chart text with structural cues to verify claims against chart evidence. It introduces ChartFC, a 15,886-sample dataset derived from TabFact to benchmark chart-based evidence verification and systematically evaluates 75 vision-language baselines, with ChartBERT achieving 63.8% accuracy. The work demonstrates feasibility but also highlights substantial challenges in numerical reasoning and chart variability, underscoring the need for further multimodal and chart-specific reasoning research. Overall, it provides a new task, a first chart-focused AFC model, and a large benchmark to spur progress in evidence-based verification using chart imagery.

Abstract

Evidence data for automated fact-checking (AFC) can be in multiple modalities such as text, tables, images, audio, or video. While there is increasing interest in using images for AFC, previous works mostly focus on detecting manipulated or fake images. We propose a novel task, chart-based fact-checking, and introduce ChartBERT as the first model for AFC against chart evidence. ChartBERT leverages textual, structural and visual information of charts to determine the veracity of textual claims. For evaluation, we create ChartFC, a new dataset of 15, 886 charts. We systematically evaluate 75 different vision-language (VL) baselines and show that ChartBERT outperforms VL models, achieving 63.8% accuracy. Our results suggest that the task is complex yet feasible, with many challenges ahead.
Paper Structure (23 sections, 10 equations, 8 figures, 7 tables)

This paper contains 23 sections, 10 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: An example from the ChartFC dataset where the claim is supported by the evidence chart.
  • Figure 2: The ChartBERT architecture.
  • Figure 3: ChartBERT input representation with the text extracted from the chart and concatenated following the approach in Section \ref{['ssec:sequence_gen']}. We include additional structural embeddings (i.e. x and y coordinates and label embeddings) to the BERT input embeddings (i.e. token, segment and position embeddings).
  • Figure 4: Dataset creation process.
  • Figure 5: Number of chart reasoning types found in $100$ dataset entries.
  • ...and 3 more figures