Table of Contents
Fetching ...

Investigating and Designing for Trust in AI-powered Code Generation Tools

Ruotong Wang, Ruijia Cheng, Denae Ford, Thomas Zimmermann

TL;DR

This paper investigates how software developers form trust in AI-powered code-generation tools and how interface design can support calibrated trust. Using a two-stage qualitative study, Study 1 conducts retrospective interviews with 17 developers to identify trust determinants (ability, benevolence, integrity) and challenges in evaluating AI outputs, highlighting situational factors that modulate trust. Study 2 then probes three design directions—usage statistics dashboards, in-context quality indicators, and explicit control mechanisms—to scaffold trustworthy evaluation, gathering feedback from 12 developers. The findings show that current tools lack effective trust affordances, leading to biased and inefficient trust judgments, and demonstrate promising design concepts to communicate AI capability, enable goal alignment, and support structured trust-building. Collectively, the work offers design recommendations for communicating AI performance, enabling user configuration, and revealing model mechanisms to foster calibrated trust in AI-assisted software development.

Abstract

As AI-powered code generation tools such as GitHub Copilot become popular, it is crucial to understand software developers' trust in AI tools -- a key factor for tool adoption and responsible usage. However, we know little about how developers build trust with AI, nor do we understand how to design the interface of generative AI systems to facilitate their appropriate levels of trust. In this paper, we describe findings from a two-stage qualitative investigation. We first interviewed 17 developers to contextualize their notions of trust and understand their challenges in building appropriate trust in AI code generation tools. We surfaced three main challenges -- including building appropriate expectations, configuring AI tools, and validating AI suggestions. To address these challenges, we conducted a design probe study in the second stage to explore design concepts that support developers' trust-building process by 1) communicating AI performance to help users set proper expectations, 2) allowing users to configure AI by setting and adjusting preferences, and 3) offering indicators of model mechanism to support evaluation of AI suggestions. We gathered developers' feedback on how these design concepts can help them build appropriate trust in AI-powered code generation tools, as well as potential risks in design. These findings inform our proposed design recommendations on how to design for trust in AI-powered code generation tools.

Investigating and Designing for Trust in AI-powered Code Generation Tools

TL;DR

This paper investigates how software developers form trust in AI-powered code-generation tools and how interface design can support calibrated trust. Using a two-stage qualitative study, Study 1 conducts retrospective interviews with 17 developers to identify trust determinants (ability, benevolence, integrity) and challenges in evaluating AI outputs, highlighting situational factors that modulate trust. Study 2 then probes three design directions—usage statistics dashboards, in-context quality indicators, and explicit control mechanisms—to scaffold trustworthy evaluation, gathering feedback from 12 developers. The findings show that current tools lack effective trust affordances, leading to biased and inefficient trust judgments, and demonstrate promising design concepts to communicate AI capability, enable goal alignment, and support structured trust-building. Collectively, the work offers design recommendations for communicating AI performance, enabling user configuration, and revealing model mechanisms to foster calibrated trust in AI-assisted software development.

Abstract

As AI-powered code generation tools such as GitHub Copilot become popular, it is crucial to understand software developers' trust in AI tools -- a key factor for tool adoption and responsible usage. However, we know little about how developers build trust with AI, nor do we understand how to design the interface of generative AI systems to facilitate their appropriate levels of trust. In this paper, we describe findings from a two-stage qualitative investigation. We first interviewed 17 developers to contextualize their notions of trust and understand their challenges in building appropriate trust in AI code generation tools. We surfaced three main challenges -- including building appropriate expectations, configuring AI tools, and validating AI suggestions. To address these challenges, we conducted a design probe study in the second stage to explore design concepts that support developers' trust-building process by 1) communicating AI performance to help users set proper expectations, 2) allowing users to configure AI by setting and adjusting preferences, and 3) offering indicators of model mechanism to support evaluation of AI suggestions. We gathered developers' feedback on how these design concepts can help them build appropriate trust in AI-powered code generation tools, as well as potential risks in design. These findings inform our proposed design recommendations on how to design for trust in AI-powered code generation tools.
Paper Structure (46 sections, 7 figures, 4 tables)

This paper contains 46 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: A usage statistics dashboard that displays personalized usage statistics to a user. Both (a) overall usage stats and (b) situational usage stats are shown in a pop-up dashboard in IDE.
  • Figure 2: Quality indicators to support users better evaluate each AI suggestion.
  • Figure 3: Two control mechanisms that allow users to communicate intentions to the AI tool. (a) control panel allows users to select system roles at the project initialization; (b) allows users to adapt AI behavior during the programming sessions.
  • Figure 4: GitHub Copilot interface, as of July 2022
  • Figure 5: Group 1: Control mechanisms
  • ...and 2 more figures