Investigating and Designing for Trust in AI-powered Code Generation Tools
Ruotong Wang, Ruijia Cheng, Denae Ford, Thomas Zimmermann
TL;DR
This paper investigates how software developers form trust in AI-powered code-generation tools and how interface design can support calibrated trust. Using a two-stage qualitative study, Study 1 conducts retrospective interviews with 17 developers to identify trust determinants (ability, benevolence, integrity) and challenges in evaluating AI outputs, highlighting situational factors that modulate trust. Study 2 then probes three design directions—usage statistics dashboards, in-context quality indicators, and explicit control mechanisms—to scaffold trustworthy evaluation, gathering feedback from 12 developers. The findings show that current tools lack effective trust affordances, leading to biased and inefficient trust judgments, and demonstrate promising design concepts to communicate AI capability, enable goal alignment, and support structured trust-building. Collectively, the work offers design recommendations for communicating AI performance, enabling user configuration, and revealing model mechanisms to foster calibrated trust in AI-assisted software development.
Abstract
As AI-powered code generation tools such as GitHub Copilot become popular, it is crucial to understand software developers' trust in AI tools -- a key factor for tool adoption and responsible usage. However, we know little about how developers build trust with AI, nor do we understand how to design the interface of generative AI systems to facilitate their appropriate levels of trust. In this paper, we describe findings from a two-stage qualitative investigation. We first interviewed 17 developers to contextualize their notions of trust and understand their challenges in building appropriate trust in AI code generation tools. We surfaced three main challenges -- including building appropriate expectations, configuring AI tools, and validating AI suggestions. To address these challenges, we conducted a design probe study in the second stage to explore design concepts that support developers' trust-building process by 1) communicating AI performance to help users set proper expectations, 2) allowing users to configure AI by setting and adjusting preferences, and 3) offering indicators of model mechanism to support evaluation of AI suggestions. We gathered developers' feedback on how these design concepts can help them build appropriate trust in AI-powered code generation tools, as well as potential risks in design. These findings inform our proposed design recommendations on how to design for trust in AI-powered code generation tools.
