$C^2$: Scalable Auto-Feedback for LLM-based Chart Generation
Woosung Koh, Jang Han Yoon, MinHyung Lee, Youngjin Song, Jaegwan Cho, Jaehyun Kang, Taehyeon Kim, Se-Young Yun, Youngjae Yu, Bongshin Lee
TL;DR
The paper addresses the challenge of evaluating and scaling LLM-based chart generation by introducing $C^2$, a scalable framework composed of ChartAF for reference-free automatic feedback and ChartUIE-8K, a large-scale chart user interaction emulation dataset. ChartAF includes ChartAF-S for scalar evaluation and ChartAF-G for granular, NL feedback, enabling test-time scaling and in-context tuning without parameter updates. ChartUIE-8K dramatically increases data diversity across queries, datasets, and chart types (by 5982%, 1936%, and 91% respectively) and aligns well with real-world use, as shown by user studies where 94% of participants preferred ChartUIE-8K queries and 93% found them realistic. Collectively, $C^2$ demonstrates scalable, open-source pathways to evaluate and generate high-quality charts with LLMs while reducing reliance on costly human curation and promoting realistic, broad data coverage.
Abstract
Generating high-quality charts with Large Language Models (LLMs) presents significant challenges due to limited data and the high cost of scaling through human curation. $\langle \text{instruction}, \text{data}, \text{code} \rangle$ triplets are scarce and expensive to manually curate as their creation demands technical expertise. To address this scalability challenge, we introduce a reference-free automatic feedback generator, which eliminates the need for costly human intervention. Our novel framework, C$^2$, consists of (1) an automatic feedback provider (ChartAF) and (2) a diverse, reference-free dataset (ChartUIE-8K). The results are compelling: in our first experiment, 74% of respondents strongly preferred, and 10% preferred, the results after feedback. The second post-feedback experiment demonstrates that ChartAF outperform nine baselines. Moreover, ChartUIE-8K significantly improves data diversity by increasing queries, datasets, and chart types by 5982%, 1936%, and 91%, respectively, over benchmarks. Finally, a study of LLM users revealed that 94% of participants preferred ChartUIE-8K's queries, with 93% deeming them aligned with real-world use cases. Core contributions are available as open-source at chartsquared.github.io, with ample qualitative examples.
