Table of Contents
Fetching ...

Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

Wojciech Samek, Thomas Wiegand, Klaus-Robert Müller

TL;DR

This paper advocates for explainable AI by detailing two key explanation methods—Sensitivity Analysis and Layer-Wise Relevance Propagation—and a framework for evaluating explanation quality. It demonstrates that LRP provides more interpretable, conservation-based attributions than gradient-based SA across image, text, and video tasks, supported by a software toolbox. The authors argue that explainability supports verification, improvement, and compliance with emerging regulations, and they outline directions for deeper theoretical grounding and broader domain adoption. Overall, the work positions explainability as a practical necessity for trustworthy and scientifically insightful AI systems.

Abstract

With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classification, sentiment analysis, speech understanding or strategic game playing. However, because of their nested non-linear structure, these highly successful machine learning and artificial intelligence models are usually applied in a black box manner, i.e., no information is provided about what exactly makes them arrive at their predictions. Since this lack of transparency can be a major drawback, e.g., in medical applications, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables. These methods are evaluated on three classification tasks.

Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

TL;DR

This paper advocates for explainable AI by detailing two key explanation methods—Sensitivity Analysis and Layer-Wise Relevance Propagation—and a framework for evaluating explanation quality. It demonstrates that LRP provides more interpretable, conservation-based attributions than gradient-based SA across image, text, and video tasks, supported by a software toolbox. The authors argue that explainability supports verification, improvement, and compliance with emerging regulations, and they outline directions for deeper theoretical grounding and broader domain adoption. Overall, the work positions explainability as a practical necessity for trustworthy and scientifically insightful AI systems.

Abstract

With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classification, sentiment analysis, speech understanding or strategic game playing. However, because of their nested non-linear structure, these highly successful machine learning and artificial intelligence models are usually applied in a black box manner, i.e., no information is provided about what exactly makes them arrive at their predictions. Since this lack of transparency can be a major drawback, e.g., in medical applications, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables. These methods are evaluated on three classification tasks.

Paper Structure

This paper contains 12 sections, 4 equations, 2 figures.

Figures (2)

  • Figure 1: Explaining predictions of an AI system. The input image is correctly classified as "rooster". In order to understand why the system has arrived at this decision, explanation methods such as SA or LRP are applied. The result of this explanation is an image, the heatmap, which visualizes the importance of each pixel for the prediction. In this example the rooster's red comb and wattle are the basis for the AI system's decision. With the heatmap one can verify that the AI system works as intended.
  • Figure 2: Explaining predictions of AI systems. (A) shows the application of explainable methods to image classification. The SA heatmaps are noisy and difficult to interpret, whereas LRP heatmaps match human intuition. (B) shows the application of explainable methods to text document classification. The SA and LRP heatmaps identify words such as "discomfort", "body" and "sickness" as the relevant ones for explaining the prediction "sci.med". In contrast to sensitivity analysis, LRP distinguishes between positive (red) and negative (blue) relevances. (C) shows explanations for a human action recognition classifier based on motion vector features. The LRP heatmaps of a video which was classified as "sit-up" show increased relevance on frames in which the person is performing an upwards and downwards movement.