The Duet of Representations and How Explanations Exacerbate It
Charles Wan, Rodrigo Belo, Leid Zejnilović, Susana Lavado
TL;DR
This paper investigates how explanations of AI predictions interact with humans' preexisting beliefs, showing that explanations can worsen decision quality when they foreground conflicting priors. It formalizes human priors and algorithmic causal representations and treats explanations as compressed representations that shape attention, testing these ideas in a field experiment with SHAP explanations in a public employment service context using an XGBoost predictor. Using LASSO-based methods to identify conflicts (e.g., college education conflicting with LTU risk) and regression analyses to assess effects on decision quality and confidence, it finds that exposing conflicting features degrades performance, especially when counselors adjust predictions. The work argues for a broader notion of communicative rationality in human-AI interaction, offering desiderata—understanding, reciprocity, negotiability, and shared reality—to bridge epistemic gaps and improve practical outcomes.
Abstract
An algorithm effects a causal representation of relations between features and labels in the human's perception. Such a representation might conflict with the human's prior belief. Explanations can direct the human's attention to the conflicting feature and away from other relevant features. This leads to causal overattribution and may adversely affect the human's information processing. In a field experiment we implemented an XGBoost-trained model as a decision-making aid for counselors at a public employment service to predict candidates' risk of long-term unemployment. The treatment group of counselors was also provided with SHAP. The results show that the quality of the human's decision-making is worse when a feature on which the human holds a conflicting prior belief is displayed as part of the explanation.
