Table of Contents
Fetching ...

DomainCQA: Crafting Knowledge-Intensive QA from Domain-Specific Charts

Yujing Lu, Ling Zhong, Jing Yang, Weiming Li, Peng Wei, Yongheng Wang, Manni Duan, Qing Zhang

TL;DR

DomainCQA presents a framework to build domain-specific chart QA benchmarks that evaluate both visual comprehension and knowledge-intensive reasoning. It introduces a Chart Complexity Vector (CCV) and a chart-abstract selector with Chain-of-Thought reasoning and cross-model voting to create two QA tiers (Fundamental QA and Advanced QA) validated by domain experts. AstroChart, the astronomy instantiation with 482 charts and 1,690 QA pairs, reveals persistent weaknesses in chart perception, numerical reasoning, and domain knowledge integration across 21 MLLMs, while fine-tuning on DomainCQA data yields performance gains and cross-domain pilot studies show generality. Together these contributions position DomainCQA as a unified pipeline for constructing, validating, and augmenting domain-specific chart reasoning benchmarks that also serve as effective training data for improving model capabilities.

Abstract

Chart Question Answering (CQA) evaluates Multimodal Large Language Models (MLLMs) on visual understanding and reasoning over chart data. However, existing benchmarks mostly test surface-level parsing, such as reading labels and legends, while overlooking deeper scientific reasoning. We propose DomainCQA, a framework for constructing domain-specific CQA benchmarks that emphasize both visual comprehension and knowledge-intensive reasoning. It integrates complexity-aware chart selection, multitier QA generation, and expert validation. Applied to astronomy, DomainCQA yields AstroChart, a benchmark of 1,690 QA pairs over 482 charts, exposing persistent weaknesses in fine-grained perception, numerical reasoning, and domain knowledge integration across 21 MLLMs. Fine-tuning on AstroChart improves performance across fundamental and advanced tasks. Pilot QA sets in biochemistry, economics, medicine, and social science further demonstrate DomainCQA's generality. Together, our results establish DomainCQA as a unified pipeline for constructing and augmenting domain-specific chart reasoning benchmarks.

DomainCQA: Crafting Knowledge-Intensive QA from Domain-Specific Charts

TL;DR

DomainCQA presents a framework to build domain-specific chart QA benchmarks that evaluate both visual comprehension and knowledge-intensive reasoning. It introduces a Chart Complexity Vector (CCV) and a chart-abstract selector with Chain-of-Thought reasoning and cross-model voting to create two QA tiers (Fundamental QA and Advanced QA) validated by domain experts. AstroChart, the astronomy instantiation with 482 charts and 1,690 QA pairs, reveals persistent weaknesses in chart perception, numerical reasoning, and domain knowledge integration across 21 MLLMs, while fine-tuning on DomainCQA data yields performance gains and cross-domain pilot studies show generality. Together these contributions position DomainCQA as a unified pipeline for constructing, validating, and augmenting domain-specific chart reasoning benchmarks that also serve as effective training data for improving model capabilities.

Abstract

Chart Question Answering (CQA) evaluates Multimodal Large Language Models (MLLMs) on visual understanding and reasoning over chart data. However, existing benchmarks mostly test surface-level parsing, such as reading labels and legends, while overlooking deeper scientific reasoning. We propose DomainCQA, a framework for constructing domain-specific CQA benchmarks that emphasize both visual comprehension and knowledge-intensive reasoning. It integrates complexity-aware chart selection, multitier QA generation, and expert validation. Applied to astronomy, DomainCQA yields AstroChart, a benchmark of 1,690 QA pairs over 482 charts, exposing persistent weaknesses in fine-grained perception, numerical reasoning, and domain knowledge integration across 21 MLLMs. Fine-tuning on AstroChart improves performance across fundamental and advanced tasks. Pilot QA sets in biochemistry, economics, medicine, and social science further demonstrate DomainCQA's generality. Together, our results establish DomainCQA as a unified pipeline for constructing and augmenting domain-specific chart reasoning benchmarks.

Paper Structure

This paper contains 58 sections, 65 figures, 11 tables, 3 algorithms.

Figures (65)

  • Figure 1: Radar plot of chart complexity across domains by comparing various visual design features, computed from 500 sampled charts per domain. Each axis represents a normalized design element contributing to overall chart complexity (formally defined later as the Chart Complexity Vector, or CCV). The domain-specific differences motivate our complexity-aware chart selection strategy.
  • Figure 2: Overview of the DomainCQA framework for constructing domain-specific CQA benchmarks. The pipeline consists of three stages: Chart Selection, QA Pair Generation and Expert QA Validation. The resulting benchmarks support evaluation of both visual comprehension and knowledge-intensive reasoning.
  • Figure 3: Chart complexity calculated from CCVs across benchmarks, where AstroChart shows a broader and higher complexity distribution than other benchmarks, with more domain-specific charts in the 6–10 range (see Appendix A.2 for CCV score details).
  • Figure 4: Performance comparison of MLLMs on Charxiv and AstroChart.
  • Figure 5: Example for CCV in AstroChart
  • ...and 60 more figures