When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming

Hussein Mozannar; Gagan Bansal; Adam Fourney; Eric Horvitz

When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming

Hussein Mozannar, Gagan Bansal, Adam Fourney, Eric Horvitz

TL;DR

The paper addresses the challenge of when to show code suggestions in AI-assisted programming by formulating a utility-based framework that leverages human feedback embedded in telemetry. It introduces CDHF, a two-stage, latency-aware policy that predicts acceptance probabilities and uses a thresholded display decision to hide or show suggestions, aiming to minimize total coding time. Retrospective analyses on Copilot telemetry with 535 programmers show that CDHF can reduce needless displays (e.g., hiding ~25% of shown suggestions) and modestly improve acceptance rates, while quantifying a trade-off between latency and display frequency. The work contributes a practical, data-driven policy for display decisions grounded in utility theory, highlights the importance of latent programmer state, and suggests directions for prospective user studies and ranking-based selection of suggestions for broader impact in AI-assisted tasks.

Abstract

AI powered code-recommendation systems, such as Copilot and CodeWhisperer, provide code suggestions inside a programmer's environment (e.g., an IDE) with the aim of improving productivity. We pursue mechanisms for leveraging signals about programmers' acceptance and rejection of code suggestions to guide recommendations. We harness data drawn from interactions with GitHub Copilot, a system used by millions of programmers, to develop interventions that can save time for programmers. We introduce a utility-theoretic framework to drive decisions about suggestions to display versus withhold. The approach, conditional suggestion display from human feedback (CDHF), relies on a cascade of models that provide the likelihood that recommended code will be accepted. These likelihoods are used to selectively hide suggestions, reducing both latency and programmer verification time. Using data from 535 programmers, we perform a retrospective evaluation of CDHF and show that we can avoid displaying a significant fraction of suggestions that would have been rejected. We further demonstrate the importance of incorporating the programmer's latent unobserved state in decisions about when to display suggestions through an ablation study. Finally, we showcase how using suggestion acceptance as a reward signal for guiding the display of suggestions can lead to suggestions of reduced quality, indicating an unexpected pitfall.

When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming

TL;DR

Abstract

Paper Structure (16 sections, 1 theorem, 11 equations, 13 figures, 1 table)

This paper contains 16 sections, 1 theorem, 11 equations, 13 figures, 1 table.

Introduction
Related Work
Problem Setting
Theoretical Formulation of Suggestion Utility
Conditional Suggestion Display From Human Feedback
Experiments
Dataset and Feature Engineering.
Model Evaluation
Retrospective Evaluation of CDHF
Which Suggestion to Show?
Conclusion
Extended Related Work
AI-Assisted Programming.
Derivation of $\mathbb{P}^*$
Model Evaluation and Analysis
...and 1 more sections

Key Result

Proposition 1

Under assumptions that the programmer spends more time writing code when they reject a suggestion compared to when they accept a suggestion and edit it, given specific code, suggestion, and latent state $(X,S,\phi)$, if the programmer's probability of accepting $\mathbb{P}(A=\textrm{accept}|X,S,\phi then the suggestion should not be shown. Note that $\mathbb{P}^*$ is defined as a function $\mathbb

Figures (13)

Figure 1: Operating mode of Copilot inside Visual Studio Code and how CDHF influences the interaction by selectively hiding certain suggestions. The data collected by the interaction is stored in telemetry and is used to train CDHF to create a feedback loop.
Figure 2: Schematic of telemetry with Copilot as a timeline. For a given coding session, the telemetry contains a sequence of timestamps and actions with associated prompts and suggestions.
Figure 3: Graphical depiction of analysis of Proposition \ref{['prop:pstar']} when the latency is zero. The y-axis shows total time and the x-axis is the programmer's probability of accepting $\mathbb{P}(A=\textrm{accept}|X,S,\phi)$. At probability $\mathbb{P}^*$, showing and not showing the suggestion have equal time cost.
Figure 4: Features used to build action prediction model in Experiments \ref{['sec:experiments']}, including from the suggestion, prompt, and session.
Figure 5: Evaluation of CDHF for selectively hiding suggestions. For a given constraint on FNR (accuracy when a suggestion is hidden) on the x-axis, we show on the y-axis the fraction of the total suggestions we can hide while guaranteeing the desired FNR. We plot these curves while varying how often the decision is made generating suggestions (R:=$\mathbb{E}[r(x)]$, when R=0, we generate the suggestion then decide to hide or not, when R=1, we decide to hide without knowing the suggestion).
...and 8 more figures

Theorems & Definitions (3)

Definition 1: Suggestion Utility
Proposition 1
proof

When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming

TL;DR

Abstract

When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (3)