Can GPT-4 Models Detect Misleading Visualizations?
Jason Alexander, Priyal Nanda, Kai-Cheng Yang, Ali Sarvghad
TL;DR
This paper investigates whether GPT-4 family LVLMs can detect misleading visualizations in social-media visuals. It uses a large dataset of tweet–visualization pairs, with 888 reasoning and 730 design misleaders, and tests four prompting regimes across three GPT-4 models. The results show that naive zero-shot yields moderate detection, with guided zero-shot and guided few-shot prompting substantially boosting accuracy; the best overall performance occurs with guided zero-shot (AUC ≈ 0.821), while misleader type influences which prompting strategy is most effective. The work demonstrates the feasibility of LVLM-based detection of visual misinformation and emphasizes the importance of prompt design for practical deployment and further research.
Abstract
The proliferation of misleading visualizations online, particularly during critical events like public health crises and elections, poses a significant risk. This study investigates the capability of GPT-4 models (4V, 4o, and 4o mini) to detect misleading visualizations. Utilizing a dataset of tweet-visualization pairs containing various visual misleaders, we test these models under four experimental conditions with different levels of guidance. We show that GPT-4 models can detect misleading visualizations with moderate accuracy without prior training (naive zero-shot) and that performance notably improves when provided with definitions of misleaders (guided zero-shot). However, a single prompt engineering technique does not yield the best results for all misleader types. Specifically, providing the models with misleader definitions and examples (guided few-shot) proves more effective for reasoning misleaders, while guided zero-shot performs better for design misleaders. This study underscores the feasibility of using large vision-language models to detect visual misinformation and the importance of prompt engineering for optimized detection accuracy.
