What Lies Beneath? Exploring the Impact of Underlying AI Model Updates in AI-Infused Systems
Vikram Mohanty, Jude Lim, Kurt Luther
TL;DR
This work investigates how users perceive and respond to underlying AI model updates in AI-infused systems, focusing on facial recognition in historical photo identification. Through a controlled online study and a real-world diary deployment on CWPS, the authors reveal that users struggle to notice model changes and rely heavily on perceived accuracy rather than objective cues like latency or result counts. Although newer models can improve technical metrics (precision/recall), this does not reliably translate into improved human-AI team performance, and users develop varied folk theories about model behavior. The findings underscore the need for granular, user-centered communication about model updates and suggest strategies to better align user expectations and workflow with evolving system capabilities across domains.
Abstract
AI models are constantly evolving, with new versions released frequently. Human-AI interaction guidelines encourage notifying users about changes in model capabilities, ideally supported by thorough benchmarking. However, as AI systems integrate into domain-specific workflows, exhaustive benchmarking can become impractical, often resulting in silent or minimally communicated updates. This raises critical questions: Can users notice these updates? What cues do they rely on to distinguish between models? How do such changes affect their behavior and task performance? We address these questions through two studies in the context of facial recognition for historical photo identification: an online experiment examining users' ability to detect model updates, followed by a diary study exploring perceptions in a real-world deployment. Our findings highlight challenges in noticing AI model updates, their impact on downstream user behavior and performance, and how they lead users to develop divergent folk theories. Drawing on these insights, we discuss strategies for effectively communicating model updates in AI-infused systems.
