CoDA: Agentic Systems for Collaborative Data Visualization
Zichen Chen, Jiefeng Chen, Sercan Ö. Arik, Misha Sra, Tomas Pfister, Jinsung Yoon
TL;DR
CoDA reframes automated data visualization as a collaborative, multi-agent problem, decomposing the workflow into understanding, planning, generation, and self-reflection to manage complex, multi-file datasets and iterative refinement. By leveraging specialized agents and metadata-driven preprocessing, CoDA bypasses token limits and improves robustness, achieving substantial improvements in overall score across MatplotBench, Qwen Code Interpreter, and DA-Code benchmarks. The approach demonstrates strong, backbone-agnostic performance with high execution reliability and visualization quality, highlighting the value of persistent collaboration and self-evaluation in automated data visualization. This work advances practical visualization automation, offering a scalable framework that can adapt to diverse data landscapes while acknowledging computational overhead and opportunities for future efficiency-driven refinements.
Abstract
Deep research has revolutionized data analysis, yet data scientists still devote substantial time to manually crafting visualizations, highlighting the need for robust automation from natural language queries. However, current systems struggle with complex datasets containing multiple files and iterative refinement. Existing approaches, including simple single- or multi-agent systems, often oversimplify the task, focusing on initial query parsing while failing to robustly manage data complexity, code errors, or final visualization quality. In this paper, we reframe this challenge as a collaborative multi-agent problem. We introduce CoDA, a multi-agent system that employs specialized LLM agents for metadata analysis, task planning, code generation, and self-reflection. We formalize this pipeline, demonstrating how metadata-focused analysis bypasses token limits and quality-driven refinement ensures robustness. Extensive evaluations show CoDA achieves substantial gains in the overall score, outperforming competitive baselines by up to 41.5%. This work demonstrates that the future of visualization automation lies not in isolated code generation but in integrated, collaborative agentic workflows.
