Explaining Explanations: An Overview of Interpretability of Machine Learning

Leilani H. Gilpin; David Bau; Ben Z. Yuan; Ayesha Bajwa; Michael Specter; Lalana Kagal

Explaining Explanations: An Overview of Interpretability of Machine Learning

Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, Lalana Kagal

TL;DR

This survey clarifies the definitions of interpretability and explainability, then introduces a three-part taxonomy for explaining deep networks: processing explanations, representations explanations, and explanation-producing systems. It reviews concrete techniques such as LIME, salience maps, transfer-learning representations, CAVs, attention mechanisms, and language-based explanations, while highlighting challenges in faithfulness, completeness, and evaluation. The authors propose structured evaluation criteria across processing, representation, and explanation-producing methods and advocate integrating approaches across categories to improve trust and practical impact. The work aims to standardize best practices and guide future research toward safe, fair, and usable AI explanations in both research and real-world applications.

Abstract

There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provide insights into their behavior and thought processes. XAI allows users and parts of the internal system to be more transparent, providing explanations of their decisions in some level of detail. These explanations are important to ensure algorithmic fairness, identify potential bias/problems in the training data, and to ensure that the algorithms perform as expected. However, explanations produced by these systems is neither standardized nor systematically assessed. In an effort to create best practices and identify open challenges, we provide our definition of explainability and show how it can be used to classify existing literature. We discuss why current approaches to explanatory methods especially for deep neural networks are insufficient. Finally, based on our survey, we conclude with suggested future research directions for explanatory artificial intelligence.

Explaining Explanations: An Overview of Interpretability of Machine Learning

TL;DR

Abstract

Explaining Explanations: An Overview of Interpretability of Machine Learning

TL;DR

Abstract

Paper Structure

Table of Contents