Table of Contents
Fetching ...

Is this chart lying to me? Automating the detection of misleading visualizations

Jonathan Tonglet, Jan Zimny, Tinne Tuytelaars, Iryna Gurevych

TL;DR

This work tackles the problem of misleading visualizations by introducing Misviz and Misviz-synth, two large open benchmarks for detecting misleaders. It develops a rule-based linter leveraging axis metadata, and image-axis classifiers that combine visual and axis information, evaluating them against state-of-the-art MLLMs. The results reveal a clear generalization gap: MLLMs excel on real-world data, while axis-aware methods dominate synthetic data, and axis extraction models trained on synthetic data struggle to generalize to real-world charts. The datasets and baselines enable targeted improvements for safeguarding readers and supporting chart designers, while highlighting future directions such as broader misleader taxonomies and improved axis extraction generalization.

Abstract

Misleading visualizations are a potent driver of misinformation on social media and the web. By violating chart design principles, they distort data and lead readers to draw inaccurate conclusions. Prior work has shown that both humans and multimodal large language models (MLLMs) are frequently deceived by such visualizations. Automatically detecting misleading visualizations and identifying the specific design rules they violate could help protect readers and reduce the spread of misinformation. However, the training and evaluation of AI models has been limited by the absence of large, diverse, and openly available datasets. In this work, we introduce Misviz, a benchmark of 2,604 real-world visualizations annotated with 12 types of misleaders. To support model training, we also create Misviz-synth, a synthetic dataset of 57,665 visualizations generated using Matplotlib and based on real-world data tables. We perform a comprehensive evaluation on both datasets using state-of-the-art MLLMs, rule-based systems, and image-axis classifiers. Our results reveal that the task remains highly challenging. We release Misviz, Misviz-synth, and the accompanying code.

Is this chart lying to me? Automating the detection of misleading visualizations

TL;DR

This work tackles the problem of misleading visualizations by introducing Misviz and Misviz-synth, two large open benchmarks for detecting misleaders. It develops a rule-based linter leveraging axis metadata, and image-axis classifiers that combine visual and axis information, evaluating them against state-of-the-art MLLMs. The results reveal a clear generalization gap: MLLMs excel on real-world data, while axis-aware methods dominate synthetic data, and axis extraction models trained on synthetic data struggle to generalize to real-world charts. The datasets and baselines enable targeted improvements for safeguarding readers and supporting chart designers, while highlighting future directions such as broader misleader taxonomies and improved axis extraction generalization.

Abstract

Misleading visualizations are a potent driver of misinformation on social media and the web. By violating chart design principles, they distort data and lead readers to draw inaccurate conclusions. Prior work has shown that both humans and multimodal large language models (MLLMs) are frequently deceived by such visualizations. Automatically detecting misleading visualizations and identifying the specific design rules they violate could help protect readers and reduce the spread of misinformation. However, the training and evaluation of AI models has been limited by the absence of large, diverse, and openly available datasets. In this work, we introduce Misviz, a benchmark of 2,604 real-world visualizations annotated with 12 types of misleaders. To support model training, we also create Misviz-synth, a synthetic dataset of 57,665 visualizations generated using Matplotlib and based on real-world data tables. We perform a comprehensive evaluation on both datasets using state-of-the-art MLLMs, rule-based systems, and image-axis classifiers. Our results reveal that the task remains highly challenging. We release Misviz, Misviz-synth, and the accompanying code.

Paper Structure

This paper contains 37 sections, 16 figures, 12 tables.

Figures (16)

  • Figure 1: Examples of the 12 types of misleaders included in Misviz. Appendix \ref{['sec:solution']} explains how these visualizations misrepresent their underlying data table.
  • Figure 2: The two-step process to create the synthetic visualizations of Misviz-synth based on real-world data.
  • Figure 3: The three type of baselines included in the experiments. The linter and one classifier require axis extraction as an intermediate step.
  • Figure 4: Examples of bounding box predictions for two instances of Misviz. The ground truth boxes are shown on the left, and the predictions on the right.
  • Figure 5: Examples of misleading visualizations from Figure \ref{['fig:misleaders']}, overlaid with visual explanations.
  • ...and 11 more figures