Table of Contents
Fetching ...

CoDA: Agentic Systems for Collaborative Data Visualization

Zichen Chen, Jiefeng Chen, Sercan Ö. Arik, Misha Sra, Tomas Pfister, Jinsung Yoon

TL;DR

CoDA reframes automated data visualization as a collaborative, multi-agent problem, decomposing the workflow into understanding, planning, generation, and self-reflection to manage complex, multi-file datasets and iterative refinement. By leveraging specialized agents and metadata-driven preprocessing, CoDA bypasses token limits and improves robustness, achieving substantial improvements in overall score across MatplotBench, Qwen Code Interpreter, and DA-Code benchmarks. The approach demonstrates strong, backbone-agnostic performance with high execution reliability and visualization quality, highlighting the value of persistent collaboration and self-evaluation in automated data visualization. This work advances practical visualization automation, offering a scalable framework that can adapt to diverse data landscapes while acknowledging computational overhead and opportunities for future efficiency-driven refinements.

Abstract

Deep research has revolutionized data analysis, yet data scientists still devote substantial time to manually crafting visualizations, highlighting the need for robust automation from natural language queries. However, current systems struggle with complex datasets containing multiple files and iterative refinement. Existing approaches, including simple single- or multi-agent systems, often oversimplify the task, focusing on initial query parsing while failing to robustly manage data complexity, code errors, or final visualization quality. In this paper, we reframe this challenge as a collaborative multi-agent problem. We introduce CoDA, a multi-agent system that employs specialized LLM agents for metadata analysis, task planning, code generation, and self-reflection. We formalize this pipeline, demonstrating how metadata-focused analysis bypasses token limits and quality-driven refinement ensures robustness. Extensive evaluations show CoDA achieves substantial gains in the overall score, outperforming competitive baselines by up to 41.5%. This work demonstrates that the future of visualization automation lies not in isolated code generation but in integrated, collaborative agentic workflows.

CoDA: Agentic Systems for Collaborative Data Visualization

TL;DR

CoDA reframes automated data visualization as a collaborative, multi-agent problem, decomposing the workflow into understanding, planning, generation, and self-reflection to manage complex, multi-file datasets and iterative refinement. By leveraging specialized agents and metadata-driven preprocessing, CoDA bypasses token limits and improves robustness, achieving substantial improvements in overall score across MatplotBench, Qwen Code Interpreter, and DA-Code benchmarks. The approach demonstrates strong, backbone-agnostic performance with high execution reliability and visualization quality, highlighting the value of persistent collaboration and self-evaluation in automated data visualization. This work advances practical visualization automation, offering a scalable framework that can adapt to diverse data landscapes while acknowledging computational overhead and opportunities for future efficiency-driven refinements.

Abstract

Deep research has revolutionized data analysis, yet data scientists still devote substantial time to manually crafting visualizations, highlighting the need for robust automation from natural language queries. However, current systems struggle with complex datasets containing multiple files and iterative refinement. Existing approaches, including simple single- or multi-agent systems, often oversimplify the task, focusing on initial query parsing while failing to robustly manage data complexity, code errors, or final visualization quality. In this paper, we reframe this challenge as a collaborative multi-agent problem. We introduce CoDA, a multi-agent system that employs specialized LLM agents for metadata analysis, task planning, code generation, and self-reflection. We formalize this pipeline, demonstrating how metadata-focused analysis bypasses token limits and quality-driven refinement ensures robustness. Extensive evaluations show CoDA achieves substantial gains in the overall score, outperforming competitive baselines by up to 41.5%. This work demonstrates that the future of visualization automation lies not in isolated code generation but in integrated, collaborative agentic workflows.

Paper Structure

This paper contains 37 sections, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Qualitative comparison of visualizations generated by baselines (MatplotAgent, VisPath, CoML4VIS) and CoDA. Provided with a natural language query and data files (if has), models produce code to create plots. CoDA yields outputs that more faithfully capture complex patterns, chart types, and aesthetics, while baselines often fail on ambiguity, 3D structures, or multi-source integration.
  • Figure 2: Overview of the CoDA framework for agentic data visualization. The workflow decomposes natural language queries into modular phases: Understanding (query intent and data metadata extraction), Planning (example code search, visual mappings, and design optimization), Generation (code generation and debugging), and Self-Reflection (quality evaluation with feedback loops for self-reflection refinement).
  • Figure 3: Ablation results. (a): Performance (EPR, VSR, OS) across different iteration counts. (b) Comparison of EPR, VSR, and OS with vs. without Global TODO. (c) Comparison of EPR, VSR, and OS with vs. without the Search Agent.
  • Figure 4: Comparison between our generated visualization and the ground truth. The results demonstrate that our system faithfully reproduces the intended trends, achieving an exact match with the reference output (score: 100/100).
  • Figure 5: Comparison between our generated visualization and the ground truth for the Steam dataset. The results indicate that our approach successfully integrates multiple heterogeneous tables and reproduces the intended visualization with complete fidelity (score: 100/100).
  • ...and 1 more figures