ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

Xuanle Zhao; Xianzhen Luo; Qi Shi; Chi Chen; Shuo Wang; Zhiyuan Liu; Maosong Sun

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Zhiyuan Liu, Maosong Sun

TL;DR

<p>This work tackles the challenge of extracting dense information from charts by reframing chart understanding as chart-to-code generation. It introduces ChartCoder, the first chart-to-code MLLM that uses a Code LLM backbone, and Chart2Code-160k, a large-scale, diverse dataset of chart-code pairs, coupled with Snippet-of-Thought (SoT) to enable step-by-step reasoning in code synthesis. The approach yields strong performance against open-source baselines and even surpasses some proprietary models, highlighting the value of lossless code representations and specialized backbones for chart reasoning. These contributions offer a practical pathway to more accurate, executable chart analysis and synthesis in multi-modal AI systems.

Abstract

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in chart understanding tasks. However, interpreting charts with textual descriptions often leads to information loss, as it fails to fully capture the dense information embedded in charts. In contrast, parsing charts into code provides lossless representations that can effectively contain all critical details. Although existing open-source MLLMs have achieved success in chart understanding tasks, they still face two major challenges when applied to chart-to-code tasks: (1) Low executability and poor restoration of chart details in the generated code and (2) Lack of large-scale and diverse training data. To address these challenges, we propose \textbf{ChartCoder}, the first dedicated chart-to-code MLLM, which leverages Code LLMs as the language backbone to enhance the executability of the generated code. Furthermore, we introduce \textbf{Chart2Code-160k}, the first large-scale and diverse dataset for chart-to-code generation, and propose the \textbf{Snippet-of-Thought (SoT)} method, which transforms direct chart-to-code generation data into step-by-step generation. Experiments demonstrate that ChartCoder, with only 7B parameters, surpasses existing open-source MLLMs on chart-to-code benchmarks, achieving superior chart restoration and code excitability. Our code is available at https://github.com/thunlp/ChartCoder.

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

TL;DR

Abstract

Paper Structure (23 sections, 8 figures, 10 tables)

This paper contains 23 sections, 8 figures, 10 tables.

Introduction
Related Works
Chart Understanding
MLLMs For Code
Chart2Code-160k Dataset
Direct Chart-to-code Generation
Step-by-step Chart-to-code Generation
Dataset Analysis
ChartCoder Model
Model Architecture
Model Training
Experiments
Baselines and Benchmarks
Main Results
Ablation Study
...and 8 more sections

Figures (8)

Figure 1: Comparison of existing MLLMs performance on ChartQA and ChartMimic benchmarks. In the chart-to-code task, open-source MLLMs struggle with mismatches in chart types and sizes, whereas ChartCoder generates accurate code.
Figure 2: Illustration of Chat2Code dataset generation process and the ChartCoder training process. The dataset generation process is divided into two stages: direct generation and step-by-step generation. In the step-by-step generation, the code processed by the Snippet-of-Thought method is sampled from the Chart2Code-160k generated in the direct generation process. The training process of the ChartCoder also consists of two stages: alignment and instruction tuning.
Figure 3: Generated charts of different model outputs after code execution. Our proposed ChartCoder performs significantly better than InternVL2-8B of a similar model scale.
Figure 4: Comparison of error types on ChartMimic direct generation tasks with code and general LLMs as language backbone, respectively.
Figure 5: A case study comparing the outputs of utilizing image and image+code as the inputs
...and 3 more figures

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

TL;DR

Abstract

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)