The Plot Thickens: Quantitative Part-by-Part Exploration of MLLM Visualization Literacy
Matheus Valentim, Vaishali Dhanoa, Gabriela Molina León, Niklas Elmqvist
TL;DR
This study probes which visualization features most influence multimodal large language model (MLLM) interpretability. By expanding the Visual Literacy Assessment Test (VLAT) to VLAT ex with 380 visualizations and 3,220 QA items, the authors systematically vary plot type, color palette, and title, enabling robust, repeated evaluations of two MLLMs. Regression analyses and reliability metrics reveal that plot type and title framing significantly affect performance, while color palettes have little effect, informing practical design principles for MLLMs. The work also releases VLAT ex to the community to standardize future benchmarking of visualization literacy in MLLMs.
Abstract
Multimodal Large Language Models (MLLMs) can interpret data visualizations, but what makes a visualization understandable to these models? Do factors like color, shape, and text influence legibility, and how does this compare to human perception? In this paper, we build on prior work to systematically assess which visualization characteristics impact MLLM interpretability. We expanded the Visualization Literacy Assessment Test (VLAT) test set from 12 to 380 visualizations by varying plot types, colors, and titles. This allowed us to statistically analyze how these features affect model performance. Our findings suggest that while color palettes have no significant impact on accuracy, plot types and the type of title significantly affect MLLM performance. We observe similar trends for model omissions. Based on these insights, we look into which plot types are beneficial for MLLMs in different tasks and propose visualization design principles that enhance MLLM readability. Additionally, we make the extended VLAT test set, VLAT ex, publicly available on https://osf.io/ermwx/ together with our supplemental material for future model testing and evaluation.
