Table of Contents
Fetching ...

The Plot Thickens: Quantitative Part-by-Part Exploration of MLLM Visualization Literacy

Matheus Valentim, Vaishali Dhanoa, Gabriela Molina León, Niklas Elmqvist

TL;DR

This study probes which visualization features most influence multimodal large language model (MLLM) interpretability. By expanding the Visual Literacy Assessment Test (VLAT) to VLAT ex with 380 visualizations and 3,220 QA items, the authors systematically vary plot type, color palette, and title, enabling robust, repeated evaluations of two MLLMs. Regression analyses and reliability metrics reveal that plot type and title framing significantly affect performance, while color palettes have little effect, informing practical design principles for MLLMs. The work also releases VLAT ex to the community to standardize future benchmarking of visualization literacy in MLLMs.

Abstract

Multimodal Large Language Models (MLLMs) can interpret data visualizations, but what makes a visualization understandable to these models? Do factors like color, shape, and text influence legibility, and how does this compare to human perception? In this paper, we build on prior work to systematically assess which visualization characteristics impact MLLM interpretability. We expanded the Visualization Literacy Assessment Test (VLAT) test set from 12 to 380 visualizations by varying plot types, colors, and titles. This allowed us to statistically analyze how these features affect model performance. Our findings suggest that while color palettes have no significant impact on accuracy, plot types and the type of title significantly affect MLLM performance. We observe similar trends for model omissions. Based on these insights, we look into which plot types are beneficial for MLLMs in different tasks and propose visualization design principles that enhance MLLM readability. Additionally, we make the extended VLAT test set, VLAT ex, publicly available on https://osf.io/ermwx/ together with our supplemental material for future model testing and evaluation.

The Plot Thickens: Quantitative Part-by-Part Exploration of MLLM Visualization Literacy

TL;DR

This study probes which visualization features most influence multimodal large language model (MLLM) interpretability. By expanding the Visual Literacy Assessment Test (VLAT) to VLAT ex with 380 visualizations and 3,220 QA items, the authors systematically vary plot type, color palette, and title, enabling robust, repeated evaluations of two MLLMs. Regression analyses and reliability metrics reveal that plot type and title framing significantly affect performance, while color palettes have little effect, informing practical design principles for MLLMs. The work also releases VLAT ex to the community to standardize future benchmarking of visualization literacy in MLLMs.

Abstract

Multimodal Large Language Models (MLLMs) can interpret data visualizations, but what makes a visualization understandable to these models? Do factors like color, shape, and text influence legibility, and how does this compare to human perception? In this paper, we build on prior work to systematically assess which visualization characteristics impact MLLM interpretability. We expanded the Visualization Literacy Assessment Test (VLAT) test set from 12 to 380 visualizations by varying plot types, colors, and titles. This allowed us to statistically analyze how these features affect model performance. Our findings suggest that while color palettes have no significant impact on accuracy, plot types and the type of title significantly affect MLLM performance. We observe similar trends for model omissions. Based on these insights, we look into which plot types are beneficial for MLLMs in different tasks and propose visualization design principles that enhance MLLM readability. Additionally, we make the extended VLAT test set, VLAT ex, publicly available on https://osf.io/ermwx/ together with our supplemental material for future model testing and evaluation.

Paper Structure

This paper contains 28 sections, 2 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Examples of original and altered plots. VLAT plot and altered plot, both visualizing monthly oil price.
  • Figure 2: Color palettes. The ten color palettes used in our experiment.
  • Figure 3: Correct and omitted answers by plot type. Accuracy and omission counts distribution per plot type.
  • Figure 4: Plot type aces and all wrongs. Aces and All Wrongs percentage relative to all questions, by plot types.
  • Figure 5: Plot types and accuracy differences. Pairwise statistically different plot types.
  • ...and 2 more figures