Table of Contents
Fetching ...

Charts-of-Thought: Enhancing LLM Visualization Literacy Through Structured Data Extraction

Amit Kumar Das, Mohammad Tarun, Klaus Mueller

TL;DR

The paper tackles the challenge that contemporary multimodal LLMs struggle with visualization literacy. It proposes Charts-of-Thought, a structured prompting framework that guides data extraction, verification, and analysis before answering visualization questions. Across modified and original VLAT datasets, the approach yields substantial gains, with Claude-3.7-sonnet surpassing human baselines and other models achieving notable improvements. The work also demonstrates applicability to chart question answering and discusses implications for automating visualization evaluation, accessibility, and data-driven interfaces, while outlining limitations and avenues for future research.

Abstract

This paper evaluates the visualization literacy of modern Large Language Models (LLMs) and introduces a novel prompting technique called Charts-of-Thought. We tested three state-of-the-art LLMs (Claude-3.7-sonnet, GPT-4.5 preview, and Gemini-2.0-pro) on the Visualization Literacy Assessment Test (VLAT) using standard prompts and our structured approach. The Charts-of-Thought method guides LLMs through a systematic data extraction, verification, and analysis process before answering visualization questions. Our results show Claude-3.7-sonnet achieved a score of 50.17 using this method, far exceeding the human baseline of 28.82. This approach improved performance across all models, with score increases of 21.8% for GPT-4.5, 9.4% for Gemini-2.0, and 13.5% for Claude-3.7 compared to standard prompting. The performance gains were consistent across original and modified VLAT charts, with Claude correctly answering 100% of questions for several chart types that previously challenged LLMs. Our study reveals that modern multimodal LLMs can surpass human performance on visualization literacy tasks when given the proper analytical framework. These findings establish a new benchmark for LLM visualization literacy and demonstrate the importance of structured prompting strategies for complex visual interpretation tasks. Beyond improving LLM visualization literacy, Charts-of-Thought could also enhance the accessibility of visualizations, potentially benefiting individuals with visual impairments or lower visualization literacy.

Charts-of-Thought: Enhancing LLM Visualization Literacy Through Structured Data Extraction

TL;DR

The paper tackles the challenge that contemporary multimodal LLMs struggle with visualization literacy. It proposes Charts-of-Thought, a structured prompting framework that guides data extraction, verification, and analysis before answering visualization questions. Across modified and original VLAT datasets, the approach yields substantial gains, with Claude-3.7-sonnet surpassing human baselines and other models achieving notable improvements. The work also demonstrates applicability to chart question answering and discusses implications for automating visualization evaluation, accessibility, and data-driven interfaces, while outlining limitations and avenues for future research.

Abstract

This paper evaluates the visualization literacy of modern Large Language Models (LLMs) and introduces a novel prompting technique called Charts-of-Thought. We tested three state-of-the-art LLMs (Claude-3.7-sonnet, GPT-4.5 preview, and Gemini-2.0-pro) on the Visualization Literacy Assessment Test (VLAT) using standard prompts and our structured approach. The Charts-of-Thought method guides LLMs through a systematic data extraction, verification, and analysis process before answering visualization questions. Our results show Claude-3.7-sonnet achieved a score of 50.17 using this method, far exceeding the human baseline of 28.82. This approach improved performance across all models, with score increases of 21.8% for GPT-4.5, 9.4% for Gemini-2.0, and 13.5% for Claude-3.7 compared to standard prompting. The performance gains were consistent across original and modified VLAT charts, with Claude correctly answering 100% of questions for several chart types that previously challenged LLMs. Our study reveals that modern multimodal LLMs can surpass human performance on visualization literacy tasks when given the proper analytical framework. These findings establish a new benchmark for LLM visualization literacy and demonstrate the importance of structured prompting strategies for complex visual interpretation tasks. Beyond improving LLM visualization literacy, Charts-of-Thought could also enhance the accessibility of visualizations, potentially benefiting individuals with visual impairments or lower visualization literacy.

Paper Structure

This paper contains 36 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Complete set of 12 visualization types, recreated from VLAT examples with modified data. These charts represent the full scope of visualization literacy tasks tested, spanning fundamental chart types from simple bar charts to complex treemaps and choropleth maps. Each visualization type was evaluated with 3-8 associated questions to assess different analytical tasks.
  • Figure 2: The responses of the three tested LLMs to the modified VLAT Q39 (a hard question) for both the Generic prompt and our Charts-of-Thought prompt. With the Generic prompt, none of the models came even close to the correct answer, while with our Charts-of-Thought prompt, two models returned the correct answer, with the third being close.
  • Figure 3: Modified VLAT results by question difficulty showing Charts-of-Thought improvements across Easy, Moderate, and Hard questions for all three LLM models.
  • Figure 4: Modified VLAT results by task type comparing Generic and Charts-of-Thought prompting performance across eight analytical tasks.
  • Figure 5: Modified VLAT results by chart type showing performance differences between prompting strategies across 12 visualization types.
  • ...and 3 more figures