Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration
Crystal Qian, James Wexler
TL;DR
Take It, Leave It, or Fix It investigates how access to a conversational AI (Bard) shapes productivity and trust during domain-specific software engineering tasks. Using a within-subject, mixed-methods design with $N=76$ participants and a two-pass, question-type–split exam, the study uncovers complex, context-dependent effects: novices show performance gains on open-ended solve-type questions, while overall productivity gains depend on expertise and task type. Participants increasingly rely on AI over time, indicating automation complacency, and trust calibrates downward after exposure, with experts more likely to distrust the AI. The work highlights design implications for appropriate trust, uncertainty communication, and robust source attribution to improve reliable human-AI collaboration in professional settings.
Abstract
Although recent developments in generative AI have greatly enhanced the capabilities of conversational agents such as Google's Gemini (formerly Bard) or OpenAI's ChatGPT, it's unclear whether the usage of these agents aids users across various contexts. To better understand how access to conversational AI affects productivity and trust, we conducted a mixed-methods, task-based user study, observing 76 software engineers (N=76) as they completed a programming exam with and without access to Bard. Effects on performance, efficiency, satisfaction, and trust vary depending on user expertise, question type (open-ended "solve" vs. definitive "search" questions), and measurement type (demonstrated vs. self-reported). Our findings include evidence of automation complacency, increased reliance on the AI over the course of the task, and increased performance for novices on "solve"-type questions when using the AI. We discuss common behaviors, design recommendations, and impact considerations to improve collaborations with conversational AI.
