WatChat: Explaining perplexing programs by debugging mental models
Kartik Chandra, Katherine M. Collins, Will Crichton, Tony Chen, Tzu-Mao Li, Adrian Weller, Rachit Nigam, Joshua Tenenbaum, Jonathan Ragan-Kelley
TL;DR
WatChat introduces a principled framework for explaining perplexing program behavior by debugging the user’s mental model through counterfactual semantics. It infers misinterpretations as misinterpreters via program synthesis, then produces targeted, causal explanations that selectively address the user’s misconceptions. The framework is instantiated in two domains—JavaScript type coercion (WatChat@JS) and Git (WatChat@Git)—and evaluated against human explanations and GPT-4, showing concise, correct explanations that align with Miller’s desiderata, while revealing gaps GPT-4 often fills with lengthier and sometimes incorrect reasoning. This work demonstrates a scalable, interactive approach to explainability that leverages explicit representations of user misconceptions to improve interpretability and education in programming tools.
Abstract
Often, a good explanation for a program's unexpected behavior is a bug in the programmer's code. But sometimes, an even better explanation is a bug in the programmer's mental model of the language or API they are using. Instead of merely debugging our current code ("giving the programmer a fish"), what if our tools could directly debug our mental models ("teaching the programmer to fish")? In this paper, we apply recent ideas from computational cognitive science to offer a principled framework for doing exactly that. Given a "why?" question about a program, we automatically infer potential misconceptions about the language/API that might cause the user to be surprised by the program's behavior -- and then analyze those misconceptions to provide explanations of the program's behavior. Our key idea is to formally represent misconceptions as counterfactual (erroneous) semantics for the language/API, which can be inferred and debugged using program synthesis techniques. We demonstrate our framework, WatChat, by building systems for explanation in two domains: JavaScript type coercion, and the Git version control system. We evaluate WatChatJS and WatChatGit by comparing their outputs to experimentally-collected human-written explanations in these two domains: we show that WatChat's explanations exhibit key features of human-written explanation, unlike those of a state-of-the-art language model.
