Table of Contents
Fetching ...

SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images

Risa Shinoda, Kuniaki Saito, Shohei Tanaka, Tosho Hirasawa, Yoshitaka Ushiku

TL;DR

The paper tackles the need for scalable, high-quality figure QA data by introducing SBS Figures, a fully synthetic, stage-by-stage pipeline that generates $1{,}000{,}000$ figure images with $4{,}200{,}000$ dense QA pairs and complete JSON annotations. It decomposes figure generation into data-topic creation, figure rendering via pre-defined Python code, and QA pair generation by LLMs, ensuring diversity, reproducibility, and error-free rendering. Empirical results show strong pre-training benefits on real-world figure QA tasks (e.g., ChartQA) for both Donut and Pix2Struct backbones, outperforming other synthetic baselines and demonstrating generalizability to other datasets and models. The work provides a practical, copyright-free resource that reduces labeling costs while enabling efficient learning for multi-modal chart understanding, with public release of pipelines, prompts, and models. The approach advances the field by confirming that carefully engineered synthetic data, paired with structured data representations, can significantly pre-train robust figure reasoning systems without manual annotation.

Abstract

Building a large-scale figure QA dataset requires a considerable amount of work, from gathering and selecting figures to extracting attributes like text, numbers, and colors, and generating QAs. Although recent developments in LLMs have led to efforts to synthesize figures, most of these focus primarily on QA generation. Additionally, creating figures directly using LLMs often encounters issues such as code errors, similar-looking figures, and repetitive content in figures. To address this issue, we present SBSFigures (Stage-by-Stage Synthetic Figures), a dataset for pre-training figure QA. Our proposed pipeline enables the creation of chart figures with complete annotations of the visualized data and dense QA annotations without any manual annotation process. Our stage-by-stage pipeline makes it possible to create diverse topic and appearance figures efficiently while minimizing code errors. Our SBSFigures demonstrate a strong pre-training effect, making it possible to achieve efficient training with a limited amount of real-world chart data starting from our pre-trained weights.

SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images

TL;DR

The paper tackles the need for scalable, high-quality figure QA data by introducing SBS Figures, a fully synthetic, stage-by-stage pipeline that generates figure images with dense QA pairs and complete JSON annotations. It decomposes figure generation into data-topic creation, figure rendering via pre-defined Python code, and QA pair generation by LLMs, ensuring diversity, reproducibility, and error-free rendering. Empirical results show strong pre-training benefits on real-world figure QA tasks (e.g., ChartQA) for both Donut and Pix2Struct backbones, outperforming other synthetic baselines and demonstrating generalizability to other datasets and models. The work provides a practical, copyright-free resource that reduces labeling costs while enabling efficient learning for multi-modal chart understanding, with public release of pipelines, prompts, and models. The approach advances the field by confirming that carefully engineered synthetic data, paired with structured data representations, can significantly pre-train robust figure reasoning systems without manual annotation.

Abstract

Building a large-scale figure QA dataset requires a considerable amount of work, from gathering and selecting figures to extracting attributes like text, numbers, and colors, and generating QAs. Although recent developments in LLMs have led to efforts to synthesize figures, most of these focus primarily on QA generation. Additionally, creating figures directly using LLMs often encounters issues such as code errors, similar-looking figures, and repetitive content in figures. To address this issue, we present SBSFigures (Stage-by-Stage Synthetic Figures), a dataset for pre-training figure QA. Our proposed pipeline enables the creation of chart figures with complete annotations of the visualized data and dense QA annotations without any manual annotation process. Our stage-by-stage pipeline makes it possible to create diverse topic and appearance figures efficiently while minimizing code errors. Our SBSFigures demonstrate a strong pre-training effect, making it possible to achieve efficient training with a limited amount of real-world chart data starting from our pre-trained weights.

Paper Structure

This paper contains 17 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: SBS Figures (Stage-by-Stage Synthetic Figures). We create SBS Figures, a dataset for pre-training figure QA. Our stage-by-stage synthetic dataset creation enables a strong pre-training effect for real-world chart data.
  • Figure 2: Generation pipeline of SBS Figures. SBS Figures was created using a fully synthetic method. First, we generate the visualization data, represented in JSON format, containing complete numbers, text, and colors. Next, we produce figure images from this data using pre-defined, error-free Python scripts. Finally, we generate dense and accurate QA pairs from visualization data without the need for OCR.
  • Figure 3: Prompt templates used in the generation pipeline of SBS Figures. We adopt few-shot prompting to ensure consistent formatting for both JSON data and QA generation. To improve efficiency, our pipeline includes code that repeatedly adjusts the context and prompts during the generation process.
  • Figure 4: Example of SBS Figures QA pairs. The figures show diverse visual variations, with each data content containing around 2,000 combinations of visual components. Additionally, our pipeline generates dense and precise QA pairs, requiring complex reasoning skills to address the questions.
  • Figure 5: Theme distribution of SBS Figures. We randomly select 10 questions from each figure type and manually analyze the topic of the figure.
  • ...and 3 more figures