Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration

Crystal Qian; James Wexler

Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration

Crystal Qian, James Wexler

TL;DR

Take It, Leave It, or Fix It investigates how access to a conversational AI (Bard) shapes productivity and trust during domain-specific software engineering tasks. Using a within-subject, mixed-methods design with $N=76$ participants and a two-pass, question-type–split exam, the study uncovers complex, context-dependent effects: novices show performance gains on open-ended solve-type questions, while overall productivity gains depend on expertise and task type. Participants increasingly rely on AI over time, indicating automation complacency, and trust calibrates downward after exposure, with experts more likely to distrust the AI. The work highlights design implications for appropriate trust, uncertainty communication, and robust source attribution to improve reliable human-AI collaboration in professional settings.

Abstract

Although recent developments in generative AI have greatly enhanced the capabilities of conversational agents such as Google's Gemini (formerly Bard) or OpenAI's ChatGPT, it's unclear whether the usage of these agents aids users across various contexts. To better understand how access to conversational AI affects productivity and trust, we conducted a mixed-methods, task-based user study, observing 76 software engineers (N=76) as they completed a programming exam with and without access to Bard. Effects on performance, efficiency, satisfaction, and trust vary depending on user expertise, question type (open-ended "solve" vs. definitive "search" questions), and measurement type (demonstrated vs. self-reported). Our findings include evidence of automation complacency, increased reliance on the AI over the course of the task, and increased performance for novices on "solve"-type questions when using the AI. We discuss common behaviors, design recommendations, and impact considerations to improve collaborations with conversational AI.

Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration

TL;DR

participants and a two-pass, question-type–split exam, the study uncovers complex, context-dependent effects: novices show performance gains on open-ended solve-type questions, while overall productivity gains depend on expertise and task type. Participants increasingly rely on AI over time, indicating automation complacency, and trust calibrates downward after exposure, with experts more likely to distrust the AI. The work highlights design implications for appropriate trust, uncertainty communication, and robust source attribution to improve reliable human-AI collaboration in professional settings.

Abstract

Paper Structure (36 sections, 7 figures, 7 tables)

This paper contains 36 sections, 7 figures, 7 tables.

Introduction
Related Work
Usage of conversational agents
Variation in interactions by user ability
Variation in interactions by context
Experiment Design
Procedure
Task design
Expertise measurement
Thematic analysis
Results
Productivity
Performance
Efficiency
Satisfaction
...and 21 more sections

Figures (7)

Figure 1: First pass on a Bard-first, solve-type question.
Figure 2: Second pass on a Bard-first, solve-type question.
Figure 3: A structural equation model showing correlations between expertise measures and prodcutivity outcomes; $\beta$ is the normalized effect size in standard deviations, and $S$ denotes standard error.$^4$
Figure 4: An OLS regression of the difference in scores between passes on expertise, with 95% confidence intervals.
Figure 5: Answer-changing percentages between passes.
...and 2 more figures

Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration

TL;DR

Abstract

Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration

Authors

TL;DR

Abstract

Table of Contents

Figures (7)