Table of Contents
Fetching ...

Debugging Defective Visualizations: Empirical Insights Informing a Human-AI Co-Debugging System

Shuyu Shen, Sirong Lu, Leixian Shen, Yuyu Luo

TL;DR

This work investigates the debugging of defective Vega-Lite visualizations by comparing forum-based human responses, zero-shot AI approaches, and a proposed mixed-initiative human-AI co-debugging system. Through a dataset of 297 Vega-Lite debugging cases from Stack Overflow and a 36-case user study, the authors show that humans provide accurate solutions but are slow, while AI offers rapid guidance but can misalign with user intent; a hybrid approach achieves 86% resolution, outperforming each modality alone. The paper derives design implications (multimodal clarification, retrieval-augmented generation, real-time code preview, and hybrid feedback) and implements a co-debugging system that integrates these modules to support iterative, visually-grounded debugging. The results suggest substantial practical benefits for visualization authors and point toward adaptable, cross-ecosystem tooling that blends human judgment with AI-generated exploration and grounding.

Abstract

Visualization authoring is an iterative process requiring users to adjust parameters to achieve desired aesthetics. Due to its complexity, users often create defective visualizations and struggle to fix them. Many seek help on forums (e.g., Stack Overflow), while others turn to AI, yet little is known about the strengths and limitations of these approaches, or how they can be effectively combined. We analyze Vega-Lite debugging cases from Stack Overflow, categorizing question types by askers, evaluating human responses, and assessing AI performance. Guided by these findings, we design a human-AI co-debugging system that combines LLM-generated suggestions with forum knowledge. We evaluated this system in a user study on 36 unresolved problems, comparing it with forum answers and LLM baselines. Our results show that while forum contributors provide accurate but slow solutions and LLMs offer immediate but sometimes misaligned guidance, the hybrid system resolves 86% of cases, higher than either alone.

Debugging Defective Visualizations: Empirical Insights Informing a Human-AI Co-Debugging System

TL;DR

This work investigates the debugging of defective Vega-Lite visualizations by comparing forum-based human responses, zero-shot AI approaches, and a proposed mixed-initiative human-AI co-debugging system. Through a dataset of 297 Vega-Lite debugging cases from Stack Overflow and a 36-case user study, the authors show that humans provide accurate solutions but are slow, while AI offers rapid guidance but can misalign with user intent; a hybrid approach achieves 86% resolution, outperforming each modality alone. The paper derives design implications (multimodal clarification, retrieval-augmented generation, real-time code preview, and hybrid feedback) and implements a co-debugging system that integrates these modules to support iterative, visually-grounded debugging. The results suggest substantial practical benefits for visualization authors and point toward adaptable, cross-ecosystem tooling that blends human judgment with AI-generated exploration and grounding.

Abstract

Visualization authoring is an iterative process requiring users to adjust parameters to achieve desired aesthetics. Due to its complexity, users often create defective visualizations and struggle to fix them. Many seek help on forums (e.g., Stack Overflow), while others turn to AI, yet little is known about the strengths and limitations of these approaches, or how they can be effectively combined. We analyze Vega-Lite debugging cases from Stack Overflow, categorizing question types by askers, evaluating human responses, and assessing AI performance. Guided by these findings, we design a human-AI co-debugging system that combines LLM-generated suggestions with forum knowledge. We evaluated this system in a user study on 36 unresolved problems, comparing it with forum answers and LLM baselines. Our results show that while forum contributors provide accurate but slow solutions and LLMs offer immediate but sometimes misaligned guidance, the hybrid system resolves 86% of cases, higher than either alone.

Paper Structure

This paper contains 48 sections, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Overview of the data curation process, aligned with our research questions (Q1-Q3). It involves four main steps: (a) Data Collection, (b) Question Selection, (c) Case Selection, and (d) AI Debugging.
  • Figure 2: An example of a multimodal question related to Vega-Lite debugging. Vega-Lite specifications can be run in the compiler to obtain the corresponding visualization and compiler feedback.
  • Figure 3: Examples of visualization debugging responses across labeling categories. Red boxes highlight forum answers (Types 1-3) that do not fully resolve the user’s debugging request. Green boxes show LLM-generated corrections, including correct solutions with minor issues (Type 4, e.g., Q78790349) and fully correct solutions (Type 5, e.g., Q60683632).
  • Figure 4: A heatmap visualizing the solution accuracy across 297 cases, comparing four configurations (Zero-Shot, +Logs, +Dataflow, and +Logs + Dataflow) across four visualization types (Q3).
  • Figure 5: A heatmap visualizing the solution accuracy across 47 cases, comparing seven configurations (Zero-Shot, +Doc, +Ex., +Doc + Ex., +Logs, +Dataflow, and +Doc + Ex. + Logs + Dataflow) across four visualization types (Q3).
  • ...and 8 more figures