Table of Contents
Fetching ...

The Persuasion Paradox: When LLM Explanations Fail to Improve Human-AI Team Performance

Ruth Cohen, Lu Feng, Ayala Bloch, Sarit Kraus

Abstract

While natural-language explanations from large language models (LLMs) are widely adopted to improve transparency and trust, their impact on objective human-AI team performance remains poorly understood. We identify a Persuasion Paradox: fluent explanations systematically increase user confidence and reliance on AI without reliably improving, and in some cases undermining, task accuracy. Across three controlled human-subject studies spanning abstract visual reasoning (RAVEN matrices) and deductive logical reasoning (LSAT problems), we disentangle the effects of AI predictions and explanations using a multi-stage reveal design and between-subjects comparisons. In visual reasoning, LLM explanations increase confidence but do not improve accuracy beyond the AI prediction alone, and substantially suppress users' ability to recover from model errors. Interfaces exposing model uncertainty via predicted probabilities, as well as a selective automation policy that defers uncertain cases to humans, achieve significantly higher accuracy and error recovery than explanation-based interfaces. In contrast, for language-based logical reasoning tasks, LLM explanations yield the highest accuracy and recovery rates, outperforming both expert-written explanations and probability-based support. This divergence reveals that the effectiveness of narrative explanations is strongly task-dependent and mediated by cognitive modality. Our findings demonstrate that commonly used subjective metrics such as trust, confidence, and perceived clarity are poor predictors of human-AI team performance. Rather than treating explanations as a universal solution, we argue for a shift toward interaction designs that prioritize calibrated reliance and effective error recovery over persuasive fluency.

The Persuasion Paradox: When LLM Explanations Fail to Improve Human-AI Team Performance

Abstract

While natural-language explanations from large language models (LLMs) are widely adopted to improve transparency and trust, their impact on objective human-AI team performance remains poorly understood. We identify a Persuasion Paradox: fluent explanations systematically increase user confidence and reliance on AI without reliably improving, and in some cases undermining, task accuracy. Across three controlled human-subject studies spanning abstract visual reasoning (RAVEN matrices) and deductive logical reasoning (LSAT problems), we disentangle the effects of AI predictions and explanations using a multi-stage reveal design and between-subjects comparisons. In visual reasoning, LLM explanations increase confidence but do not improve accuracy beyond the AI prediction alone, and substantially suppress users' ability to recover from model errors. Interfaces exposing model uncertainty via predicted probabilities, as well as a selective automation policy that defers uncertain cases to humans, achieve significantly higher accuracy and error recovery than explanation-based interfaces. In contrast, for language-based logical reasoning tasks, LLM explanations yield the highest accuracy and recovery rates, outperforming both expert-written explanations and probability-based support. This divergence reveals that the effectiveness of narrative explanations is strongly task-dependent and mediated by cognitive modality. Our findings demonstrate that commonly used subjective metrics such as trust, confidence, and perceived clarity are poor predictors of human-AI team performance. Rather than treating explanations as a universal solution, we argue for a shift toward interaction designs that prioritize calibrated reliance and effective error recovery over persuasive fluency.

Paper Structure

This paper contains 22 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: The multi-stage human--AI decision process used to separate the effects of the AI prediction and its explanation on user decisions.
  • Figure 2: Overall accuracy and self-reported confidence across the three reveal stages in the multi-stage RAVEN study. Accuracy improves after the AI prediction but plateaus once explanations are introduced, while confidence increases only after explanations are presented.
  • Figure 3: Relationship between agreement with correct AI predictions and recovery from incorrect AI predictions across support conditions in the RAVEN task. The highlighted lower-right region indicates high agreement paired with low recovery, reflecting error masking behavior.
  • Figure 4: Objective accuracy across human--AI support conditions in the RAVEN task, including a derived selective automation policy for comparison.
  • Figure 5: Illustrative user interface examples from the LSAT study.
  • ...and 2 more figures