Table of Contents
Fetching ...

How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations?

Leo Yu-Ho Lo, Huamin Qu

TL;DR

This work investigates how well multimodal LLMs detect misleading visualizations by treating charts as bitmap images and evaluating prompting strategies. It employs four multimodal LLMs, nine prompts, and expands the detection scope from five to 21 chart issues using a dataset of internet-sourced misleading charts and a valid-chart baseline. Chain-of-Thought prompting emerges as the most effective strategy, though scalability of prompt length remains a challenge; factual-question prompts can improve correctness, while some prompts risk over-reporting issues. The study demonstrates the potential of multimodal LLMs to bolster visualization literacy and counter misinformation, and outlines future directions involving prompt pools, agentic prompting, and benchmark datasets.

Abstract

In this study, we address the growing issue of misleading charts, a prevalent problem that undermines the integrity of information dissemination. Misleading charts can distort the viewer's perception of data, leading to misinterpretations and decisions based on false information. The development of effective automatic detection methods for misleading charts is an urgent field of research. The recent advancement of multimodal Large Language Models (LLMs) has introduced a promising direction for addressing this challenge. We explored the capabilities of these models in analyzing complex charts and assessing the impact of different prompting strategies on the models' analyses. We utilized a dataset of misleading charts collected from the internet by prior research and crafted nine distinct prompts, ranging from simple to complex, to test the ability of four different multimodal LLMs in detecting over 21 different chart issues. Through three experiments--from initial exploration to detailed analysis--we progressively gained insights into how to effectively prompt LLMs to identify misleading charts and developed strategies to address the scalability challenges encountered as we expanded our detection range from the initial five issues to 21 issues in the final experiment. Our findings reveal that multimodal LLMs possess a strong capability for chart comprehension and critical thinking in data interpretation. There is significant potential in employing multimodal LLMs to counter misleading information by supporting critical thinking and enhancing visualization literacy. This study demonstrates the applicability of LLMs in addressing the pressing concern of misleading charts.

How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations?

TL;DR

This work investigates how well multimodal LLMs detect misleading visualizations by treating charts as bitmap images and evaluating prompting strategies. It employs four multimodal LLMs, nine prompts, and expands the detection scope from five to 21 chart issues using a dataset of internet-sourced misleading charts and a valid-chart baseline. Chain-of-Thought prompting emerges as the most effective strategy, though scalability of prompt length remains a challenge; factual-question prompts can improve correctness, while some prompts risk over-reporting issues. The study demonstrates the potential of multimodal LLMs to bolster visualization literacy and counter misinformation, and outlines future directions involving prompt pools, agentic prompting, and benchmark datasets.

Abstract

In this study, we address the growing issue of misleading charts, a prevalent problem that undermines the integrity of information dissemination. Misleading charts can distort the viewer's perception of data, leading to misinterpretations and decisions based on false information. The development of effective automatic detection methods for misleading charts is an urgent field of research. The recent advancement of multimodal Large Language Models (LLMs) has introduced a promising direction for addressing this challenge. We explored the capabilities of these models in analyzing complex charts and assessing the impact of different prompting strategies on the models' analyses. We utilized a dataset of misleading charts collected from the internet by prior research and crafted nine distinct prompts, ranging from simple to complex, to test the ability of four different multimodal LLMs in detecting over 21 different chart issues. Through three experiments--from initial exploration to detailed analysis--we progressively gained insights into how to effectively prompt LLMs to identify misleading charts and developed strategies to address the scalability challenges encountered as we expanded our detection range from the initial five issues to 21 issues in the final experiment. Our findings reveal that multimodal LLMs possess a strong capability for chart comprehension and critical thinking in data interpretation. There is significant potential in employing multimodal LLMs to counter misleading information by supporting critical thinking and enhancing visualization literacy. This study demonstrates the applicability of LLMs in addressing the pressing concern of misleading charts.
Paper Structure (12 sections, 4 figures, 4 tables)

This paper contains 12 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Experiment Two Prompts: Prompt #4 poses a checklist for LLMs to go through. Prompt #5 applies the Chain of Thought strategy. Prompt #6 modifies Prompt #5 to prevent LLMs from bypassing reasoning questions. The example chart contains fictional data for parody purposes. Phrases in blue denote accurate interpretations. Portions of the prompt and responses are omitted for clarity.
  • Figure 2: Experiment Three Prompt #7 extends Prompt #6 in applying the Chain of Thought strategy to include additional chart issue definitions. The example chart has a major issue of setting the y-axis range inappropriately. Phrases in blue denote accurate interpretations. Portions of the prompt and responses are omitted for clarity.
  • Figure 3: Issues detected by LLMs across different prompts, each issue appeared three times in the test set. Edge numbers indicate the correct identification percentage for each row or column.
  • Figure 4: Percentage of factual questions accurately answered by LLMs on chart properties, axes, scale, and encoding. Edge numbers indicate the correct answer percentage for each row or column.