WatChat: Explaining perplexing programs by debugging mental models

Kartik Chandra; Katherine M. Collins; Will Crichton; Tony Chen; Tzu-Mao Li; Adrian Weller; Rachit Nigam; Joshua Tenenbaum; Jonathan Ragan-Kelley

WatChat: Explaining perplexing programs by debugging mental models

Kartik Chandra, Katherine M. Collins, Will Crichton, Tony Chen, Tzu-Mao Li, Adrian Weller, Rachit Nigam, Joshua Tenenbaum, Jonathan Ragan-Kelley

TL;DR

WatChat introduces a principled framework for explaining perplexing program behavior by debugging the user’s mental model through counterfactual semantics. It infers misinterpretations as misinterpreters via program synthesis, then produces targeted, causal explanations that selectively address the user’s misconceptions. The framework is instantiated in two domains—JavaScript type coercion (WatChat@JS) and Git (WatChat@Git)—and evaluated against human explanations and GPT-4, showing concise, correct explanations that align with Miller’s desiderata, while revealing gaps GPT-4 often fills with lengthier and sometimes incorrect reasoning. This work demonstrates a scalable, interactive approach to explainability that leverages explicit representations of user misconceptions to improve interpretability and education in programming tools.

Abstract

Often, a good explanation for a program's unexpected behavior is a bug in the programmer's code. But sometimes, an even better explanation is a bug in the programmer's mental model of the language or API they are using. Instead of merely debugging our current code ("giving the programmer a fish"), what if our tools could directly debug our mental models ("teaching the programmer to fish")? In this paper, we apply recent ideas from computational cognitive science to offer a principled framework for doing exactly that. Given a "why?" question about a program, we automatically infer potential misconceptions about the language/API that might cause the user to be surprised by the program's behavior -- and then analyze those misconceptions to provide explanations of the program's behavior. Our key idea is to formally represent misconceptions as counterfactual (erroneous) semantics for the language/API, which can be inferred and debugged using program synthesis techniques. We demonstrate our framework, WatChat, by building systems for explanation in two domains: JavaScript type coercion, and the Git version control system. We evaluate WatChatJS and WatChatGit by comparing their outputs to experimentally-collected human-written explanations in these two domains: we show that WatChat's explanations exhibit key features of human-written explanation, unlike those of a state-of-the-art language model.

WatChat: Explaining perplexing programs by debugging mental models

TL;DR

Abstract

Paper Structure (47 sections, 7 figures, 5 tables)

This paper contains 47 sections, 7 figures, 5 tables.

Introduction
The WatChat framework for explanation
Evaluation
Methods
Choice of scenarios
Comparison to human-written explanations
Comparison to large language models (LLMs)
Coding responses
Results
Length and correctness of explanations
Meeting Miller's desiderata
Choosing among multiple explanations
Differences between humans and WatChat@Git
Conclusion
WatChat for JavaScript
...and 32 more sections

Figures (7)

Figure 1: Overview of the WatChat framework for explanation---here, applied to explain the output of a JavaScript program. Panel "A" shows WatChat's model of how programmers come to ask "why?" questions. Panel "B" shows how WatChat reasons over the model in Panel "A" to find a good explanation.
Figure 2: High-level overview of this paper. We describe WatChat, a general framework for explanation, and we apply it to produce two systems: WatChat@JS and WatChat@Git.
Figure 3: Average length and correctness of explanations across our 11 scenarios. Explanations from WatChat and humans are consistently correct and succinct, typically well under 500 characters. Explanations from GPT-4, in contrast, are often incorrect (especially in Git scenarios), and are typically around 1,000--1,500 characters long. All error bars in this and subsequent figures are 95% confidence intervals. In the bottom graph, error bars are narrower for human and GPT-4 responses than for WatChat's responses: this is because for humans and GPT-4, data is aggregated across several independent responses/rollouts in addition to across the four coders.
Figure 4: How do explanations depend on the scenario being asked about (Section \ref{['sec:js-analysis']})? Here, each row considers one pair of example JavaScript scenarios. Each plot within a row shows how a particular statement $S$'s presence in an explanation varies by scenario. WatChat tracks trends in human responses well, but GPT-4 often over-explains. This is particularly salient in the three shaded plots, where GPT-4 (green line) departs from the trend in human (blue) and WatChat (orange) explanations. (See note on error bars in Figure \ref{['fig:summary']}.)
Figure 5: What happens to explanations when the user's mental model is ambiguous (Section \ref{['sec:git-analysis']})? In aggregate, human-authored explanations address a variety of statements for each scenario (top); however, each individual human-authored explanation typically contains only a subset of those statements (bottom). This suggests that different people inferred different erroneous mental models and addressed different misconceptions. WatChat@Git exhibits similar behavior, though in many scenarios its explanations contain fewer statements than human-written explanations. This in turn suggests new misconceptions to add to our system. (See note on error bars in Figure \ref{['fig:summary']}.)
...and 2 more figures

WatChat: Explaining perplexing programs by debugging mental models

TL;DR

Abstract

WatChat: Explaining perplexing programs by debugging mental models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)