Debugging Defective Visualizations: Empirical Insights Informing a Human-AI Co-Debugging System
Shuyu Shen, Sirong Lu, Leixian Shen, Yuyu Luo
TL;DR
This work investigates the debugging of defective Vega-Lite visualizations by comparing forum-based human responses, zero-shot AI approaches, and a proposed mixed-initiative human-AI co-debugging system. Through a dataset of 297 Vega-Lite debugging cases from Stack Overflow and a 36-case user study, the authors show that humans provide accurate solutions but are slow, while AI offers rapid guidance but can misalign with user intent; a hybrid approach achieves 86% resolution, outperforming each modality alone. The paper derives design implications (multimodal clarification, retrieval-augmented generation, real-time code preview, and hybrid feedback) and implements a co-debugging system that integrates these modules to support iterative, visually-grounded debugging. The results suggest substantial practical benefits for visualization authors and point toward adaptable, cross-ecosystem tooling that blends human judgment with AI-generated exploration and grounding.
Abstract
Visualization authoring is an iterative process requiring users to adjust parameters to achieve desired aesthetics. Due to its complexity, users often create defective visualizations and struggle to fix them. Many seek help on forums (e.g., Stack Overflow), while others turn to AI, yet little is known about the strengths and limitations of these approaches, or how they can be effectively combined. We analyze Vega-Lite debugging cases from Stack Overflow, categorizing question types by askers, evaluating human responses, and assessing AI performance. Guided by these findings, we design a human-AI co-debugging system that combines LLM-generated suggestions with forum knowledge. We evaluated this system in a user study on 36 unresolved problems, comparing it with forum answers and LLM baselines. Our results show that while forum contributors provide accurate but slow solutions and LLMs offer immediate but sometimes misaligned guidance, the hybrid system resolves 86% of cases, higher than either alone.
