Table of Contents
Fetching ...

OpusAnimation: Code-Based Dynamic Chart Generation

Bozheng Li, Miao Yang, Zhenhan Chen, Jiawang Cao, Mushui Liu, Yi Lu, Yongliang Wu, Bin Zhang, Yangguang Ji, Licheng Tang, Jay Wu, Wenbo Zhu

TL;DR

The paper addresses the gap in dynamic, code-based chart generation by introducing DCG-Bench and the DCG-8K dataset to evaluate multimodal LLMs on three tasks: D2C, S2C, and V2C. It proposes a two-stage training approach—supervised fine-tuning followed by Joint-Code-Visual Reward based GRPO—to build an expert MLLM for DCG, achieving strong open-source performance comparable to larger models. Results show notable improvements in both code and video generation across tasks, with GRPO aiding generalization and a multi-modal reward signal mitigating memorization. The work provides a new benchmark and training methodology that can advance practical dynamic chart generation while acknowledging dependence on a proprietary reward model and outlining future improvements.

Abstract

Dynamic Chart Generation (DCG) involves producing code-rendered animated visualizations as charts. While recent advances in multi-modal large language models (MLLMs) have significantly improved their capability on static chart generation and comprehension, MLLMs' potential for handling dynamic chart generation and understanding remains underexplored. To bridge this research gap, we introduce DCG-Bench (Dynamic Chart Generation Benchmark), the first benchmark evaluating MLLM's capability on dynamic chart generation tasks from three dimensions: Simple Text-to-Chart, Detailed Text-to-Chart, and Video-to-Chart tasks. We construct DCG-8K, a high-quality DCG dataset with annotations covering instruction-code-video triplets and QA pairs for both code and video evaluation. Based on DCG-8K, we explored a two-stage training recipe, proposing Joint-Code-Visual Reward for group relative policy optimization to construct expert MLLM Qwen2.5-VL-DCG-3B for the DCG task. Our benchmarking result reveals shortcomings of existing MLLMs in the visual-to-chart task, and our model beats the best open-sourced MLLM with an average 8.31% performance gain across three tasks, and shows on par performance against proprietary models with only 3B parameters, proving the effectiveness of our training recipe. Our code and dataset will be publicly available.

OpusAnimation: Code-Based Dynamic Chart Generation

TL;DR

The paper addresses the gap in dynamic, code-based chart generation by introducing DCG-Bench and the DCG-8K dataset to evaluate multimodal LLMs on three tasks: D2C, S2C, and V2C. It proposes a two-stage training approach—supervised fine-tuning followed by Joint-Code-Visual Reward based GRPO—to build an expert MLLM for DCG, achieving strong open-source performance comparable to larger models. Results show notable improvements in both code and video generation across tasks, with GRPO aiding generalization and a multi-modal reward signal mitigating memorization. The work provides a new benchmark and training methodology that can advance practical dynamic chart generation while acknowledging dependence on a proprietary reward model and outlining future improvements.

Abstract

Dynamic Chart Generation (DCG) involves producing code-rendered animated visualizations as charts. While recent advances in multi-modal large language models (MLLMs) have significantly improved their capability on static chart generation and comprehension, MLLMs' potential for handling dynamic chart generation and understanding remains underexplored. To bridge this research gap, we introduce DCG-Bench (Dynamic Chart Generation Benchmark), the first benchmark evaluating MLLM's capability on dynamic chart generation tasks from three dimensions: Simple Text-to-Chart, Detailed Text-to-Chart, and Video-to-Chart tasks. We construct DCG-8K, a high-quality DCG dataset with annotations covering instruction-code-video triplets and QA pairs for both code and video evaluation. Based on DCG-8K, we explored a two-stage training recipe, proposing Joint-Code-Visual Reward for group relative policy optimization to construct expert MLLM Qwen2.5-VL-DCG-3B for the DCG task. Our benchmarking result reveals shortcomings of existing MLLMs in the visual-to-chart task, and our model beats the best open-sourced MLLM with an average 8.31% performance gain across three tasks, and shows on par performance against proprietary models with only 3B parameters, proving the effectiveness of our training recipe. Our code and dataset will be publicly available.

Paper Structure

This paper contains 21 sections, 3 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Illustration of the Dynamic Chart Generation Task, and the bad case of existing MLLM on the V2C task.
  • Figure 2: Demonstration of three DCG task types and our data curation pipeline. Left: Three task types (Detail Text to Chart, Simple Text to Chart, and Video to Chart) in DCG-Bench. Right: Dataset curation pipeline from raw ECharts code to final DCG-8K dataset.
  • Figure 3: Training Recipe of Qwen2.5-VL-DCG, including SFT and JCV-GRPO training
  • Figure 4: Generalization Comparison Between SFT and JCV-GRPO on D2C, S2C, and V2C tasks across Execute Rate, Code Score, and Video Score metric.
  • Figure 5: Ablation study of reward score ratio in JCV-GRPO training, covering ratio from 1:0 to 0:1 for code:video reward
  • ...and 3 more figures