When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming
Hussein Mozannar, Gagan Bansal, Adam Fourney, Eric Horvitz
TL;DR
The paper addresses the challenge of when to show code suggestions in AI-assisted programming by formulating a utility-based framework that leverages human feedback embedded in telemetry. It introduces CDHF, a two-stage, latency-aware policy that predicts acceptance probabilities and uses a thresholded display decision to hide or show suggestions, aiming to minimize total coding time. Retrospective analyses on Copilot telemetry with 535 programmers show that CDHF can reduce needless displays (e.g., hiding ~25% of shown suggestions) and modestly improve acceptance rates, while quantifying a trade-off between latency and display frequency. The work contributes a practical, data-driven policy for display decisions grounded in utility theory, highlights the importance of latent programmer state, and suggests directions for prospective user studies and ranking-based selection of suggestions for broader impact in AI-assisted tasks.
Abstract
AI powered code-recommendation systems, such as Copilot and CodeWhisperer, provide code suggestions inside a programmer's environment (e.g., an IDE) with the aim of improving productivity. We pursue mechanisms for leveraging signals about programmers' acceptance and rejection of code suggestions to guide recommendations. We harness data drawn from interactions with GitHub Copilot, a system used by millions of programmers, to develop interventions that can save time for programmers. We introduce a utility-theoretic framework to drive decisions about suggestions to display versus withhold. The approach, conditional suggestion display from human feedback (CDHF), relies on a cascade of models that provide the likelihood that recommended code will be accepted. These likelihoods are used to selectively hide suggestions, reducing both latency and programmer verification time. Using data from 535 programmers, we perform a retrospective evaluation of CDHF and show that we can avoid displaying a significant fraction of suggestions that would have been rejected. We further demonstrate the importance of incorporating the programmer's latent unobserved state in decisions about when to display suggestions through an ablation study. Finally, we showcase how using suggestion acceptance as a reward signal for guiding the display of suggestions can lead to suggestions of reduced quality, indicating an unexpected pitfall.
