Beware of "Explanations" of AI

David Martens; Galit Shmueli; Theodoros Evgeniou; Kevin Bauer; Christian Janiesch; Stefan Feuerriegel; Sebastian Gabel; Sofie Goethals; Travis Greene; Nadja Klein; Mathias Kraus; Niklas Kühl; Claudia Perlich; Wouter Verbeke; Alona Zharova; Patrick Zschech; Foster Provost

Beware of "Explanations" of AI

David Martens, Galit Shmueli, Theodoros Evgeniou, Kevin Bauer, Christian Janiesch, Stefan Feuerriegel, Sebastian Gabel, Sofie Goethals, Travis Greene, Nadja Klein, Mathias Kraus, Niklas Kühl, Claudia Perlich, Wouter Verbeke, Alona Zharova, Patrick Zschech, Foster Provost

TL;DR

The paper addresses the problem that explanations for AI decisions are not universally beneficial and can cause harm if poorly designed. It argues for a socio-technical approach that places explanations in the context of stakeholder goals, mental models, and regulatory environments. It catalogs common explanation types (post-hoc, global/local, feature importance, counterfactuals) and highlights root causes of poor explanations (unfaithfulness, irrelevance, instability, etc.). The authors call for interdisciplinary research, context-aware evaluation, and policy-facing safeguards to ensure explanations support safe, responsible AI adoption.

Abstract

Understanding the decisions made and actions taken by increasingly complex AI system remains a key challenge. This has led to an expanding field of research in explainable artificial intelligence (XAI), highlighting the potential of explanations to enhance trust, support adoption, and meet regulatory standards. However, the question of what constitutes a "good" explanation is dependent on the goals, stakeholders, and context. At a high level, psychological insights such as the concept of mental model alignment can offer guidance, but success in practice is challenging due to social and technical factors. As a result of this ill-defined nature of the problem, explanations can be of poor quality (e.g. unfaithful, irrelevant, or incoherent), potentially leading to substantial risks. Instead of fostering trust and safety, poorly designed explanations can actually cause harm, including wrong decisions, privacy violations, manipulation, and even reduced AI adoption. Therefore, we caution stakeholders to beware of explanations of AI: while they can be vital, they are not automatically a remedy for transparency or responsible AI adoption, and their misuse or limitations can exacerbate harm. Attention to these caveats can help guide future research to improve the quality and impact of AI explanations.

Beware of "Explanations" of AI

TL;DR

Abstract

Beware of "Explanations" of AI

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)