Table of Contents
Fetching ...

Making AI Agents Evaluate Misleading Charts without Nudging

Swaroop Panda

TL;DR

AI agents are increasingly used as low-cost proxies for early visualization evaluation, but may underweight data integrity when not explicitly prompted. The study tests un-nudged AI judgments on deliberately flawed charts by evaluating ten visualizations with BeauVis and PREVis scales using a GPT-5.2-based agent in a single pass. Findings show that aesthetic appeal and perceived readability often remain high despite integrity flaws, with patterns such as emphasis on low-level values over higher-level patterns, highlighting a mismatch between surface signals and data accuracy. The work demonstrates the limitations of unprompted AI screening for graphical integrity and recommends pairing with explicit integrity checks or targeted prompts to assess whether encodings preserve quantitative relationships, informing better design of AI-assisted visualization evaluation pipelines.

Abstract

AI agents are increasingly used as low-cost proxies for early visualization evaluation. In an initial study of deliberately flawed charts, we test whether agents spontaneously penalise chart junk and misleading encodings without being prompted to look for errors. Using established scales (BeauVis and PREVis), the agent evaluated visualizations containing decorative clutter, manipulated axes, and distorted proportional cues. The ratings of aesthetic appeal and perceived readability often remained relatively high even when graphical integrity was compromised. These results suggest that un-nudged AI agent evaluation may underweight integrity-related defects unless such checks are explicitly elicited.

Making AI Agents Evaluate Misleading Charts without Nudging

TL;DR

AI agents are increasingly used as low-cost proxies for early visualization evaluation, but may underweight data integrity when not explicitly prompted. The study tests un-nudged AI judgments on deliberately flawed charts by evaluating ten visualizations with BeauVis and PREVis scales using a GPT-5.2-based agent in a single pass. Findings show that aesthetic appeal and perceived readability often remain high despite integrity flaws, with patterns such as emphasis on low-level values over higher-level patterns, highlighting a mismatch between surface signals and data accuracy. The work demonstrates the limitations of unprompted AI screening for graphical integrity and recommends pairing with explicit integrity checks or targeted prompts to assess whether encodings preserve quantitative relationships, informing better design of AI-assisted visualization evaluation pipelines.

Abstract

AI agents are increasingly used as low-cost proxies for early visualization evaluation. In an initial study of deliberately flawed charts, we test whether agents spontaneously penalise chart junk and misleading encodings without being prompted to look for errors. Using established scales (BeauVis and PREVis), the agent evaluated visualizations containing decorative clutter, manipulated axes, and distorted proportional cues. The ratings of aesthetic appeal and perceived readability often remained relatively high even when graphical integrity was compromised. These results suggest that un-nudged AI agent evaluation may underweight integrity-related defects unless such checks are explicitly elicited.
Paper Structure (5 sections, 3 tables)

This paper contains 5 sections, 3 tables.