Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback
Fatemeh Pesaran Zadeh, Juyeon Kim, Jin-Hwa Kim, Gunhee Kim
TL;DR
The paper addresses the challenge of generating diverse, accurate charts from natural language and data by identifying gaps in existing chart-generation datasets and training paradigms. It introduces Text2Chart31, a hierarchical data-generation pipeline coupled with an RL-based instruction-tuning framework that uses automatic feedback via preference and alignment rewards. Across three tasks—Description-to-Chart, Raw Data-to-Chart, and Code-to-Description—the approach yields state-of-the-art performance among open-source models and competitive results with proprietary systems, particularly excelling on underrepresented chart types. The work provides a scalable dataset and a practical RL-based training recipe that improves LLM-driven data visualization, with implications for broader multi-modal instruction-tuning and chart-automation workflows.
Abstract
Large language models (LLMs) have demonstrated strong capabilities across various language tasks, notably through instruction-tuning methods. However, LLMs face challenges in visualizing complex, real-world data through charts and plots. Firstly, existing datasets rarely cover a full range of chart types, such as 3D, volumetric, and gridded charts. Secondly, supervised fine-tuning methods do not fully leverage the intricate relationships within rich datasets, including text, code, and figures. To address these challenges, we propose a hierarchical pipeline and a new dataset for chart generation. Our dataset, Text2Chart31, includes 31 unique plot types referring to the Matplotlib library, with 11.1K tuples of descriptions, code, data tables, and plots. Moreover, we introduce a reinforcement learning-based instruction tuning technique for chart generation tasks without requiring human feedback. Our experiments show that this approach significantly enhances the model performance, enabling smaller models to outperform larger open-source models and be comparable to state-of-the-art proprietary models in data visualization tasks. We make the code and dataset available at https://github.com/fatemehpesaran310/Text2Chart31.
