Table of Contents
Fetching ...

Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness

Srija Mukhopadhyay, Adnan Qidwai, Aparna Garimella, Pritika Ramu, Vivek Gupta, Dan Roth

TL;DR

This work probes chart question answering (CQA) by rigorously testing state-of-the-art visual language models across chart types and question complexities, as well as under diverse visual perturbations. It introduces ChartQA-Split to analyze performance by chart and question complexity and RobustCQA to benchmark robustness to perturbations, using zero-shot Chain-of-Thought prompting and an extraction-based approach for smaller models. The study reveals that no model universally excels, with performance hinging on chart and question type and showing substantial drops under perturbations, indicating critical gaps in data extraction and visual reasoning. The findings highlight actionable directions for robust CQA systems, including perturbation-aware training, improved data extraction on non-annotated charts, and interpretable models capable of reliable chart understanding in varied visual representations.

Abstract

Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art VLMs on comprehensive datasets, developed specifically for this study, encompassing diverse question categories and chart formats. We investigate two key aspects: 1) the models' ability to handle varying levels of chart and question complexity, and 2) their robustness across different visual representations of the same underlying data. Our analysis reveals significant performance variations based on question and chart types, highlighting both strengths and weaknesses of current models. Additionally, we identify areas for improvement and propose future research directions to build more robust and reliable CQA systems. This study sheds light on the limitations of current models and paves the way for future advancements in the field.

Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness

TL;DR

This work probes chart question answering (CQA) by rigorously testing state-of-the-art visual language models across chart types and question complexities, as well as under diverse visual perturbations. It introduces ChartQA-Split to analyze performance by chart and question complexity and RobustCQA to benchmark robustness to perturbations, using zero-shot Chain-of-Thought prompting and an extraction-based approach for smaller models. The study reveals that no model universally excels, with performance hinging on chart and question type and showing substantial drops under perturbations, indicating critical gaps in data extraction and visual reasoning. The findings highlight actionable directions for robust CQA systems, including perturbation-aware training, improved data extraction on non-annotated charts, and interpretable models capable of reliable chart understanding in varied visual representations.

Abstract

Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art VLMs on comprehensive datasets, developed specifically for this study, encompassing diverse question categories and chart formats. We investigate two key aspects: 1) the models' ability to handle varying levels of chart and question complexity, and 2) their robustness across different visual representations of the same underlying data. Our analysis reveals significant performance variations based on question and chart types, highlighting both strengths and weaknesses of current models. Additionally, we identify areas for improvement and propose future research directions to build more robust and reliable CQA systems. This study sheds light on the limitations of current models and paves the way for future advancements in the field.
Paper Structure (41 sections, 27 figures, 9 tables)

This paper contains 41 sections, 27 figures, 9 tables.

Figures (27)

  • Figure 1: Simple and Complex Questions on a Complex chart
  • Figure 2: Example of simple chart and complex chart, along with simple and complex questions.
  • Figure 3: Examples of different types of perturbations on the same original chart and data.
  • Figure 4: Prompt for testing chart question answering
  • Figure 5: Prompt for extracting answers through an LLM from a different LLM
  • ...and 22 more figures