The Impact of Imperfect XAI on Human-AI Decision-Making

Katelyn Morrison; Philipp Spitzer; Violet Turri; Michelle Feng; Niklas Kühl; Adam Perer

The Impact of Imperfect XAI on Human-AI Decision-Making

Katelyn Morrison, Philipp Spitzer, Violet Turri, Michelle Feng, Niklas Kühl, Adam Perer

TL;DR

This study investigates how imperfect XAI—explanations that may misalign with AI predictions—shapes human-AI decision-making, particularly focusing on expert versus non-expert users and two explanation modalities (natural language and example-based). Using a robust mixed-methods bird species identification task, it introduces the Deception of Reliance (DoR) metric and extends the Appropriateness of Reliance framework by adding XAI advice as a moderating dimension. Key findings show that expertise moderates reliance when explanations are correct, and that example-based explanations are more deceptive than natural language explanations, affecting both reliance and team performance. The work provides design guidelines for deploying imperfect XAI in CSCW and HCI contexts, including when to rely on AI, how to present explanations, and how to manage expertise in real-world decision-making workflows.

Abstract

Explainability techniques are rapidly being developed to improve human-AI decision-making across various cooperative work settings. Consequently, previous research has evaluated how decision-makers collaborate with imperfect AI by investigating appropriate reliance and task performance with the aim of designing more human-centered computer-supported collaborative tools. Several human-centered explainable AI (XAI) techniques have been proposed in hopes of improving decision-makers' collaboration with AI; however, these techniques are grounded in findings from previous studies that primarily focus on the impact of incorrect AI advice. Few studies acknowledge the possibility of the explanations being incorrect even if the AI advice is correct. Thus, it is crucial to understand how imperfect XAI affects human-AI decision-making. In this work, we contribute a robust, mixed-methods user study with 136 participants to evaluate how incorrect explanations influence humans' decision-making behavior in a bird species identification task, taking into account their level of expertise and an explanation's level of assertiveness. Our findings reveal the influence of imperfect XAI and humans' level of expertise on their reliance on AI and human-AI team performance. We also discuss how explanations can deceive decision-makers during human-AI collaboration. Hence, we shed light on the impacts of imperfect XAI in the field of computer-supported cooperative work and provide guidelines for designers of human-AI collaboration systems.

The Impact of Imperfect XAI on Human-AI Decision-Making

TL;DR

Abstract

Paper Structure (38 sections, 6 equations, 12 figures, 6 tables)

This paper contains 38 sections, 6 equations, 12 figures, 6 tables.

Introduction
Related Work
Decision-Making with Imperfect AI/XAI
Domain Expertise & Human-AI Complementarity
Explanation Modality
Theoretical Development
Methodology
Task Domain: Bird Species Identification
Study Design
Data Selection
Explanation Modalities
Natural Language Explanations
Example-Based Explanations
Assertiveness of Explanations
Recruitment
...and 23 more sections

Figures (12)

Figure 1: Different paths that human decision-makers could follow based on receiving AI and XAI advice. This figure expands that presented by schemmer2023appropriate by contributing the XAI advice dimension. The XAI advice is simplified into correct and incorrect explanations. The green checkmarks represent correct advice/decisions, while the red 'x' represents incorrect advice/decisions.
Figure 2: Research model for collaborating with imperfect XAI systems. We analyze the moderation of the level of expertise and assertiveness on the effect of the correctness of explanation on RAIR and RSR.
Figure 3: We conduct a rigorous mixed-methods study leveraging a mixed design. Before participants start the task, they are shown a screening test (A). For the human-AI bird identification task, participants are assigned an explanation modality (B). During the task, participants are shown explanations with different levels of assertiveness and different scenarios of correctness (C). Lastly, participants complete a post-survey (D).
Figure 4: Example of the two phases for a single bird image that a participant is shown in the study. This specifically shows a Magnolia Warbler (correct prediction, correct explanation), and this participant is assigned to the example-based explanations. For this bird, the participant is shown an assertive explanation.
Figure 5: Representative examples of the example-based and natural language explanations for each scenario: CC, CI, IC, and II. The class of the example-based images in the explanation is not shown to participants during the study. The red and green coloring on the natural language explanations was not shown during the study. This is only provided in the figure to guide the reader. The natural language explanation for the Cerulean Warbler is incorrect because this bird species does not have a grey head. The natural language explanation for the Yellow-billed Cuckoo is incorrect because this bird species is brown with a white belly, has a gold and black beak, and does not have a black wing.
...and 7 more figures

The Impact of Imperfect XAI on Human-AI Decision-Making

TL;DR

Abstract

The Impact of Imperfect XAI on Human-AI Decision-Making

Authors

TL;DR

Abstract

Table of Contents

Figures (12)