The Impact of Imperfect XAI on Human-AI Decision-Making
Katelyn Morrison, Philipp Spitzer, Violet Turri, Michelle Feng, Niklas Kühl, Adam Perer
TL;DR
This study investigates how imperfect XAI—explanations that may misalign with AI predictions—shapes human-AI decision-making, particularly focusing on expert versus non-expert users and two explanation modalities (natural language and example-based). Using a robust mixed-methods bird species identification task, it introduces the Deception of Reliance (DoR) metric and extends the Appropriateness of Reliance framework by adding XAI advice as a moderating dimension. Key findings show that expertise moderates reliance when explanations are correct, and that example-based explanations are more deceptive than natural language explanations, affecting both reliance and team performance. The work provides design guidelines for deploying imperfect XAI in CSCW and HCI contexts, including when to rely on AI, how to present explanations, and how to manage expertise in real-world decision-making workflows.
Abstract
Explainability techniques are rapidly being developed to improve human-AI decision-making across various cooperative work settings. Consequently, previous research has evaluated how decision-makers collaborate with imperfect AI by investigating appropriate reliance and task performance with the aim of designing more human-centered computer-supported collaborative tools. Several human-centered explainable AI (XAI) techniques have been proposed in hopes of improving decision-makers' collaboration with AI; however, these techniques are grounded in findings from previous studies that primarily focus on the impact of incorrect AI advice. Few studies acknowledge the possibility of the explanations being incorrect even if the AI advice is correct. Thus, it is crucial to understand how imperfect XAI affects human-AI decision-making. In this work, we contribute a robust, mixed-methods user study with 136 participants to evaluate how incorrect explanations influence humans' decision-making behavior in a bird species identification task, taking into account their level of expertise and an explanation's level of assertiveness. Our findings reveal the influence of imperfect XAI and humans' level of expertise on their reliance on AI and human-AI team performance. We also discuss how explanations can deceive decision-makers during human-AI collaboration. Hence, we shed light on the impacts of imperfect XAI in the field of computer-supported cooperative work and provide guidelines for designers of human-AI collaboration systems.
