PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback
Kanika Goswami, Puneet Mathur, Ryan Rossi, Franck Dernoncourt
TL;DR
PlotGen tackles the challenge of enabling novices to generate accurate scientific visualizations by orchestrating a multi-agent LLM system that decomposes requests, generates executable Python visualization code, and refines outputs through multimodal self-reflection. The framework combines a Query Planning Agent, a Code Generation Agent, and Numeric, Lexical, and Visual Feedback Agents that validate data, labels, and aesthetics, respectively, using multimodal LLMs. Empirical results on the MatPlotBench benchmark show PlotGen surpassing strong baselines, with the best performance achieved using GPT-4 and all three feedback channels, and ablation studies confirming the importance of each feedback component. User studies further indicate enhanced trust and reduced debugging time, suggesting PlotGen's practical potential for democratizing access to advanced data visualization tools.
Abstract
Scientific data visualization is pivotal for transforming raw data into comprehensible visual representations, enabling pattern recognition, forecasting, and the presentation of data-driven insights. However, novice users often face difficulties due to the complexity of selecting appropriate tools and mastering visualization techniques. Large Language Models (LLMs) have recently demonstrated potential in assisting code generation, though they struggle with accuracy and require iterative debugging. In this paper, we propose PlotGen, a novel multi-agent framework aimed at automating the creation of precise scientific visualizations. PlotGen orchestrates multiple LLM-based agents, including a Query Planning Agent that breaks down complex user requests into executable steps, a Code Generation Agent that converts pseudocode into executable Python code, and three retrieval feedback agents - a Numeric Feedback Agent, a Lexical Feedback Agent, and a Visual Feedback Agent - that leverage multimodal LLMs to iteratively refine the data accuracy, textual labels, and visual correctness of generated plots via self-reflection. Extensive experiments show that PlotGen outperforms strong baselines, achieving a 4-6 percent improvement on the MatPlotBench dataset, leading to enhanced user trust in LLM-generated visualizations and improved novice productivity due to a reduction in debugging time needed for plot errors.
