Table of Contents
Fetching ...

Automated Visualization Code Synthesis via Multi-Path Reasoning and Feedback-Driven Optimization

Wonduk Seo, Daye Kang, Hyunjin An, Taehan Kim, Soohyuk Cho, Seungyong Lee, Minhyeong Yu, Jian Park, Yi Bu, Seunghyun Lee

TL;DR

This work tackles underspecified data visualization requests by introducing VisPath, a framework that combines multi-path reasoning with feedback-driven optimization to generate robust visualization code from natural language. It expands the input into multiple plausible concretizations via Chain-of-Thought prompting, executes candidate scripts, and uses a vision-language feedback loop to synthesize an optimal final program grounded in the dataset context. Empirical results on MatPlotBench and the Qwen-Agent Code Interpreter Benchmark show VisPath achieving higher Plot Scores and Executable Rates than baselines, with ablations confirming the benefits of path diversity and multimodal feedback. The approach promises more reliable, interpretable, and automate-friendly visualization generation, with practical impact on BI, data science, and automated reporting tasks.

Abstract

Large Language Models (LLMs) have become a cornerstone for automated visualization code generation, enabling users to create charts through natural language instructions. Despite improvements from techniques like few-shot prompting and query expansion, existing methods often struggle when requests are underspecified in actionable details (e.g., data preprocessing assumptions, solver or library choices, etc.), frequently necessitating manual intervention. To overcome these limitations, we propose VisPath: a Multi-Path Reasoning and Feedback-Driven Optimization Framework for Visualization Code Generation. VisPath handles underspecified queries through structured, multi-stage processing. It begins by using Chain-of-Thought (CoT) prompting to reformulate the initial user input, generating multiple extended queries in parallel to surface alternative plausible concretizations of the request. These queries then generate candidate visualization scripts, which are executed to produce diverse images. By assessing the visual quality and correctness of each output, VisPath generates targeted feedback that is aggregated to synthesize an optimal final result. Extensive experiments on MatPlotBench and Qwen-Agent Code Interpreter Benchmark show that VisPath outperforms state-of-the-art methods, providing a more reliable framework for AI-driven visualization generation.

Automated Visualization Code Synthesis via Multi-Path Reasoning and Feedback-Driven Optimization

TL;DR

This work tackles underspecified data visualization requests by introducing VisPath, a framework that combines multi-path reasoning with feedback-driven optimization to generate robust visualization code from natural language. It expands the input into multiple plausible concretizations via Chain-of-Thought prompting, executes candidate scripts, and uses a vision-language feedback loop to synthesize an optimal final program grounded in the dataset context. Empirical results on MatPlotBench and the Qwen-Agent Code Interpreter Benchmark show VisPath achieving higher Plot Scores and Executable Rates than baselines, with ablations confirming the benefits of path diversity and multimodal feedback. The approach promises more reliable, interpretable, and automate-friendly visualization generation, with practical impact on BI, data science, and automated reporting tasks.

Abstract

Large Language Models (LLMs) have become a cornerstone for automated visualization code generation, enabling users to create charts through natural language instructions. Despite improvements from techniques like few-shot prompting and query expansion, existing methods often struggle when requests are underspecified in actionable details (e.g., data preprocessing assumptions, solver or library choices, etc.), frequently necessitating manual intervention. To overcome these limitations, we propose VisPath: a Multi-Path Reasoning and Feedback-Driven Optimization Framework for Visualization Code Generation. VisPath handles underspecified queries through structured, multi-stage processing. It begins by using Chain-of-Thought (CoT) prompting to reformulate the initial user input, generating multiple extended queries in parallel to surface alternative plausible concretizations of the request. These queries then generate candidate visualization scripts, which are executed to produce diverse images. By assessing the visual quality and correctness of each output, VisPath generates targeted feedback that is aggregated to synthesize an optimal final result. Extensive experiments on MatPlotBench and Qwen-Agent Code Interpreter Benchmark show that VisPath outperforms state-of-the-art methods, providing a more reliable framework for AI-driven visualization generation.

Paper Structure

This paper contains 26 sections, 5 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of the proposed VisPath framework for creating robust visualization code generation. The framework consists of a combination of Multi-Path Agent, Visual Feedback Agent, and Synthesis Agent.
  • Figure 2: Effect of varying the number of reasoning paths $K$ on performance across datasets and models. Metrics include Plot Score, Executable Rate. The results show that $K=3$ achieves the best overall balance, with larger $K$ values reducing performance.
  • Figure 3: (a) Scatter plot generation with explicit spatial and annotation constraints. User query: “Create a scatter plot of two distinct sets of random data, each containing 150 points. The first set (Group X) should be centered around (-2, -2) and visualized in blue, and the second set (Group Y) should be centered around (2, 2) and visualized in orange. Label each group at their respective centers with a round white box (...) ” (b) Large-scale time-series visualization task. User query: “Visualize a large number of time series in three different ways.” This task evaluates each method’s ability to interpret an underspecified request, select appropriate plotting strategies, and compose multiple complementary visualizations for dense temporal data. GT denotes the ground-truth visualization.