Table of Contents
Fetching ...

Anteater: Interactive Visualization of Program Execution Values in Context

Rebecca Faust, Katherine Isaacs, William Z. Bernstein, Michael Sharp, Carlos Scheidegger

TL;DR

This paper presents Anteater, a visualization-first debugger that automatically instruments Python programs to produce execution traces enriched with user-specified variable values. By organizing traces into a generalized context tree and related value plots, Anteater enables global overviews and interactive exploration that expose execution structure, value trends, dependencies, and relationships beyond traditional line-by-line debuggers. The approach leverages a tracing pipeline (AST-based instrumentation) and a data backend (JSON-to-SQL, Vega-Lite visualization) to support flexible visualizations and interactions, including selection, filtering, faceting, and linking to source code. Preliminary user studies and a comparative IDE evaluation indicate that Anteater supports discovery of bugs and deeper program understanding, though the work notes limitations in trace size, single-threaded focus, and the need for future integration with IDEs and broader scalability. Overall, Anteater demonstrates the potential of visualization-centric debugging to provide richer, global insights into program execution and value behavior, with practical implications for debugging workflows and education.

Abstract

Debugging is famously one the hardest parts in programming. In this paper, we tackle the question: what does a debugging environment look like when we take interactive visualization as a central design principle? We introduce Anteater, an interactive visualization system for tracing and exploring the execution of Python programs. Existing systems often have visualization components built on top of an existing infrastructure. In contrast, Anteater's organization of trace data enables an intermediate representation which can be leveraged to automatically synthesize a variety of visualizations and interactions. These interactive visualizations help with tasks such as discovering important structures in the execution and understanding and debugging unexpected behaviors. To assess the utility of Anteater, we conducted a participant study where programmers completed tasks on their own python programs using Anteater. Finally, we discuss limitations and where further research is needed.

Anteater: Interactive Visualization of Program Execution Values in Context

TL;DR

This paper presents Anteater, a visualization-first debugger that automatically instruments Python programs to produce execution traces enriched with user-specified variable values. By organizing traces into a generalized context tree and related value plots, Anteater enables global overviews and interactive exploration that expose execution structure, value trends, dependencies, and relationships beyond traditional line-by-line debuggers. The approach leverages a tracing pipeline (AST-based instrumentation) and a data backend (JSON-to-SQL, Vega-Lite visualization) to support flexible visualizations and interactions, including selection, filtering, faceting, and linking to source code. Preliminary user studies and a comparative IDE evaluation indicate that Anteater supports discovery of bugs and deeper program understanding, though the work notes limitations in trace size, single-threaded focus, and the need for future integration with IDEs and broader scalability. Overall, Anteater demonstrates the potential of visualization-centric debugging to provide richer, global insights into program execution and value behavior, with practical implications for debugging workflows and education.

Abstract

Debugging is famously one the hardest parts in programming. In this paper, we tackle the question: what does a debugging environment look like when we take interactive visualization as a central design principle? We introduce Anteater, an interactive visualization system for tracing and exploring the execution of Python programs. Existing systems often have visualization components built on top of an existing infrastructure. In contrast, Anteater's organization of trace data enables an intermediate representation which can be leveraged to automatically synthesize a variety of visualizations and interactions. These interactive visualizations help with tasks such as discovering important structures in the execution and understanding and debugging unexpected behaviors. To assess the utility of Anteater, we conducted a participant study where programmers completed tasks on their own python programs using Anteater. Finally, we discuss limitations and where further research is needed.

Paper Structure

This paper contains 67 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 2: An overview of the Anteater UI on a recursive Fibonacci program, tracking the variable "val". (A) shows the UI presented by Anteater (not including (B)). The generalized context tree (GCT), or icicle plot, shown on the top right side of (A), shows the structure of the execution trace. The teal blocks represent function calls while the varying shades of purple represent the value of "val" at that instance. We can see the recursive calling structure of the Fibonacci function and can easily identify where it is repeating work. The plot currently shows a scatterplot view of the variable "val" over time. Brushing over the scatterplot highlights the corresponding instances in the GCT (the red blocks shown in the GCT on the right side of (A)) and the context bar. The scatterplot shows repetitive patterns that indicate that Fibonacci is doing redundant work. (B) shows a second view of the GCT (inset into the image of the main UI) after we've clicked on a block in the tree which caused its dependencies to be highlighted in red. This shows that the selected block (on the far right of the fifth row in the GCT in (B)), representing an instance of "val", depends on the prior two calls to the Fibonacci function (shown by the two blocks highlighted in red).
  • Figure 3: An overview of the Anteater system. In (a), a user chooses variables and expressions to track using the Anteater interface. This defines the trace specification. Then, Anteater sends the trace specification through the web interface to the python backend, along with the source code. Next, in (b), the Anteater tracer instruments the source code to collect execution information along with the specified values. (c) shows a simplified version of this instrumentation. After the code is instrumented, Anteater runs the program using python to create the program trace. This trace is passed back through the web interface to the Anteater front end where (in (d)) it is visualized and presented to the user.
  • Figure 4: An overview of how Anteater goes from source code to visualization. (A) shows the initial source code. We are going to track the variable "val" After instrumenting the source code, as demonstrated in Fig. \ref{['fig:sysOVerview']}. The instrumented program creates a trace cell as shown in (B). Anteater then puts the JSON into a SQL table as shown in (C). From there, Anteater queries the table to select all points from "Tracked" that have the name "val" and passes them to Anteater's Vega-lite generator which generates a Vega-lite specification (as shown in (D)) for the corresponding plot. Anteater then renders the specification to create a scatterplot of those points over time (shown in (E)).
  • Figure 5: An example of Anteater splitting the data by a structural element. Anteater splits the data by instances of a for loop at line 167, which corresponds to iterations of the loop at line 166 (the selected block in the generalized context tree). The plot shows one boxplot per loop instance.
  • Figure 6: Debugging Gradient Descent with Anteater. In (A) it is immediately apparent in both the generalized context tree and the histogram that there is a bug causing NaN's, shown in green in both the histogram and GCT (NaN means "Not A Number", special floating-point values that indicate numerical failures). In (B), we switch to the scatterplot view to see how the values behave before they become NaN. The values are mostly centered around zero before becoming an extremely small negative, then going to infinity and becoming NaN. We suspect that the values centered around zero are not actually zeros so we filter the values in the scatterplot to allow us to zoom in on them and switch to a symmetric log scale, shown in (C). Now we see that the values are oscillating which suggests the problem of exploding gradients caused by a training rate that is too large. Fig. \ref{['fig:gdOverview2']} shows the Anteater visualizations after correcting the bug.
  • ...and 3 more figures