Human-Centered Evaluation of XAI Methods
Karam Dawoud, Wojciech Samek, Peter Eisert, Sebastian Lapuschkin, Sebastian Bosse
TL;DR
The paper tackles the challenge of making AI explanations interpretable to end users by directly comparing three local XAI methods (ProtoPNet, Occlusion, LRP) against human-generated baselines in a controlled image-recognition task. Using a ClickMe-based human baseline and a dataset of 102 images, the study shows that while the highlighted regions differ across methods, the resulting interpretability is broadly similar for humans, and no single method consistently outperforms others. ProtoPNet offers higher classification accuracy but requires more image exposure, whereas LRP provides efficient, informative explanations with lower exposure; Occlusion lags in recognition performance but offers simple, fast explanations. The findings support a human-centered approach to XAI evaluation and suggest adaptive method selection or recommender systems to tailor explanations to users and tasks, with implications for deploying transparent AI in practice.
Abstract
In the ever-evolving field of Artificial Intelligence, a critical challenge has been to decipher the decision-making processes within the so-called "black boxes" in deep learning. Over recent years, a plethora of methods have emerged, dedicated to explaining decisions across diverse tasks. Particularly in tasks like image classification, these methods typically identify and emphasize the pivotal pixels that most influence a classifier's prediction. Interestingly, this approach mirrors human behavior: when asked to explain our rationale for classifying an image, we often point to the most salient features or aspects. Capitalizing on this parallel, our research embarked on a user-centric study. We sought to objectively measure the interpretability of three leading explanation methods: (1) Prototypical Part Network, (2) Occlusion, and (3) Layer-wise Relevance Propagation. Intriguingly, our results highlight that while the regions spotlighted by these methods can vary widely, they all offer humans a nearly equivalent depth of understanding. This enables users to discern and categorize images efficiently, reinforcing the value of these methods in enhancing AI transparency.
