Table of Contents
Fetching ...

POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering

Yichen Xu, Liangyu Chen, Liang Zhang, Jianzhe Ma, Wenxuan Wang, Qin Jin

TL;DR

PolyChartQA addresses the lack of multilingual chart understanding benchmarks by introducing a scalable pipeline to generate a large, high-quality multilingual chart QA dataset across 10 languages. The authors demonstrate substantial performance gaps between English and non-English inputs for state-of-the-art LVLMs, and show that fine-tuning on PolyChartQA-Train yields meaningful gains across model families and sizes. They also provide an in-depth error analysis revealing OCR and language bias as key bottlenecks, and present data-scale effects and cross-lingual patterns to guide future research. Overall, PolyChartQA lays the groundwork for globally inclusive vision-language models capable of interpreting charts across diverse linguistic contexts, with practical implications for data-driven decision-making worldwide.

Abstract

Charts are a universally adopted medium for data communication, yet existing chart understanding benchmarks are overwhelmingly English-centric, limiting their accessibility and relevance to global audiences. To address this limitation, we introduce PolyChartQA, the first large-scale multilingual benchmark for chart question answering, comprising 22,606 charts and 26,151 QA pairs across 10 diverse languages. PolyChartQA is constructed through a scalable pipeline that enables efficient multilingual chart generation via data translation and code reuse, supported by LLM-based translation and rigorous quality control. We systematically evaluate multilingual chart understanding with PolyChartQA on state-of-the-art LVLMs and reveal a significant performance gap between English and other languages, particularly low-resource ones. Additionally, we introduce a companion multilingual chart question answering training set, PolyChartQA-Train, on which fine-tuning LVLMs yields substantial gains in multilingual chart understanding across diverse model sizes and architectures. Together, our benchmark provides a foundation for developing globally inclusive vision-language models capable of understanding charts across diverse linguistic contexts.

POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering

TL;DR

PolyChartQA addresses the lack of multilingual chart understanding benchmarks by introducing a scalable pipeline to generate a large, high-quality multilingual chart QA dataset across 10 languages. The authors demonstrate substantial performance gaps between English and non-English inputs for state-of-the-art LVLMs, and show that fine-tuning on PolyChartQA-Train yields meaningful gains across model families and sizes. They also provide an in-depth error analysis revealing OCR and language bias as key bottlenecks, and present data-scale effects and cross-lingual patterns to guide future research. Overall, PolyChartQA lays the groundwork for globally inclusive vision-language models capable of interpreting charts across diverse linguistic contexts, with practical implications for data-driven decision-making worldwide.

Abstract

Charts are a universally adopted medium for data communication, yet existing chart understanding benchmarks are overwhelmingly English-centric, limiting their accessibility and relevance to global audiences. To address this limitation, we introduce PolyChartQA, the first large-scale multilingual benchmark for chart question answering, comprising 22,606 charts and 26,151 QA pairs across 10 diverse languages. PolyChartQA is constructed through a scalable pipeline that enables efficient multilingual chart generation via data translation and code reuse, supported by LLM-based translation and rigorous quality control. We systematically evaluate multilingual chart understanding with PolyChartQA on state-of-the-art LVLMs and reveal a significant performance gap between English and other languages, particularly low-resource ones. Additionally, we introduce a companion multilingual chart question answering training set, PolyChartQA-Train, on which fine-tuning LVLMs yields substantial gains in multilingual chart understanding across diverse model sizes and architectures. Together, our benchmark provides a foundation for developing globally inclusive vision-language models capable of understanding charts across diverse linguistic contexts.

Paper Structure

This paper contains 54 sections, 20 figures, 16 tables.

Figures (20)

  • Figure 1: Example of inconsistent chart understanding by LVLMs. The model answers correctly in English but fails on the Hindi equivalent.
  • Figure 2: Overview of the PolyChartQA data pipeline. (a) The full workflow consists of two stages: Seed Data Preparation and Multilingual Chart Generation. (b) Quality control procedures applied with seed data generation. (c) Quality control procedures applied during the translation stage.
  • Figure 3: Multilingual chart question answering visualizations selected from PolyChartQA. First row, from left to right: Arabic, Bengali, Spanish, French. Second row, from left to right: Hindi, Japanese, Russian, Urdu.
  • Figure 4: Distribution of chart types in PolyChartQA.
  • Figure 5: Error analysis across error types, chart types, and question types.
  • ...and 15 more figures