Table of Contents
Fetching ...

Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation

Minggui He, Mingchen Dai, Jian Zhang, Yilun Liu, Shimin Tao, Pufan Zeng, Osamu Yoshie, Yuya Ieiri

TL;DR

This work introduces Chart Specification, a structured intermediate representation that encodes chart topology, data bindings, and runtime-derived numerics to align chart understanding with executable plotting code. By constructing ChartStruct, a structurally balanced training corpus, and employing a Spec-Align Reward within a reinforcement learning framework, the approach delivers dense, verifiable feedback that steers models toward faithful plotting logic. Across ChartMimic, Plot2Code, and ChartX benchmarks, the method achieves state-of-the-art results with strong data efficiency (notably with only 3K–4K training samples) and robust generalization to complex chart types. The combination of structure-aware supervision, runtime data grounding, and a hierarchical reward mechanism offers a scalable path to high-fidelity chart-to-code generation with practical impact for reproducibility and automated chart tooling.

Abstract

Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images, yet achieving structural fidelity remains challenging. Existing approaches largely rely on supervised fine-tuning, encouraging surface-level token imitation rather than faithful modeling of underlying chart structure, which often leads to hallucinated or semantically inconsistent outputs. We propose Chart Specification, a structured intermediate representation that shifts training from text imitation to semantically grounded supervision. Chart Specification filters syntactic noise to construct a structurally balanced training set and supports a Spec-Align Reward that provides fine-grained, verifiable feedback on structural correctness, enabling reinforcement learning to enforce consistent plotting logic. Experiments on three public benchmarks show that our method consistently outperforms prior approaches. With only 3K training samples, we achieve strong data efficiency, surpassing leading baselines by up to 61.7% on complex benchmarks, and scaling to 4K samples establishes new state-of-the-art results across all evaluated metrics. Overall, our results demonstrate that precise structural supervision offers an efficient pathway to high-fidelity chart-to-code generation. Code and dataset are available at: https://github.com/Mighten/chart-specification-paper

Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation

TL;DR

This work introduces Chart Specification, a structured intermediate representation that encodes chart topology, data bindings, and runtime-derived numerics to align chart understanding with executable plotting code. By constructing ChartStruct, a structurally balanced training corpus, and employing a Spec-Align Reward within a reinforcement learning framework, the approach delivers dense, verifiable feedback that steers models toward faithful plotting logic. Across ChartMimic, Plot2Code, and ChartX benchmarks, the method achieves state-of-the-art results with strong data efficiency (notably with only 3K–4K training samples) and robust generalization to complex chart types. The combination of structure-aware supervision, runtime data grounding, and a hierarchical reward mechanism offers a scalable path to high-fidelity chart-to-code generation with practical impact for reproducibility and automated chart tooling.

Abstract

Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images, yet achieving structural fidelity remains challenging. Existing approaches largely rely on supervised fine-tuning, encouraging surface-level token imitation rather than faithful modeling of underlying chart structure, which often leads to hallucinated or semantically inconsistent outputs. We propose Chart Specification, a structured intermediate representation that shifts training from text imitation to semantically grounded supervision. Chart Specification filters syntactic noise to construct a structurally balanced training set and supports a Spec-Align Reward that provides fine-grained, verifiable feedback on structural correctness, enabling reinforcement learning to enforce consistent plotting logic. Experiments on three public benchmarks show that our method consistently outperforms prior approaches. With only 3K training samples, we achieve strong data efficiency, surpassing leading baselines by up to 61.7% on complex benchmarks, and scaling to 4K samples establishes new state-of-the-art results across all evaluated metrics. Overall, our results demonstrate that precise structural supervision offers an efficient pathway to high-fidelity chart-to-code generation. Code and dataset are available at: https://github.com/Mighten/chart-specification-paper
Paper Structure (40 sections, 3 equations, 6 figures, 9 tables)

This paper contains 40 sections, 3 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Motivation for structure-aware chart reasoning. (Top) Direct chart-to-code models rely on surface-level imitation and often hallucinate structural dependencies. (Bottom) By explicitly modeling chart structure via Chart Specification, our approach enforces constraint-consistent plotting logic and faithful visual reconstruction.
  • Figure 2: The Overview of Our Framework. (A) Specification-Driven Data Curation: Adopting Chart Specification ($\mathcal{S}$) to extract semantic intent ($\mathcal{S}_{sem}$) and physical execution data ($\mathcal{S}_{code}$) from raw scripts, and guiding the curation of the ChartStruct corpus. (B) Group Relative Policy Optimization: The VLM policy is optimized using group-based advantage estimation. (C) Hierarchical Reward Tree: A fine-grained reward mechanism validates candidates through a staircase pipeline, checking Integrity (Phase 1) and Semantic Topology (Phase 2) before calculating precise Code Metrics (Phase 3).
  • Figure 3: Visualization of our Chart Specification ($\mathcal{S}$) across four distinct chart types. The grey panels represent the $\mathcal{S}_{sem}$, capturing declarative intents like topology and data domains. The orange panels (bottom-left and bottom-right) illustrate the $\mathcal{S}_{code}$, which uses runtime interception to capture implicit data, such as calculated wedge ratios in Ring charts or node-edge relationships in Network graphs.
  • Figure 4: Impact of training data scale on chart-to-code performance. We compare SFT and Spec-Align RL across varying data sizes on ChartMimic (a) and Plot2Code (b), evaluated using Pass Rate, Low-level accuracy, and Text-Match.
  • Figure 5: Type-wise performance comparison at the 3K data scale on Chartmimic. Here, low level average accuracy across different chart types. Columns correspond to chart categories, and rows denote different training settings: Base (Qwen2.5VL-7B-Instruct), SFT (Standard Fine-Tuning), RL (no-CoT) (Spec-Align RL without reasoning), and RL (CoT) (Spec-Align RL with reasoning). The Overall column reports weighted averages over chart types.
  • ...and 1 more figures