Table of Contents
Fetching ...

Model-Agnostic Interpretability of Machine Learning

Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin

TL;DR

The paper argues for model-agnostic interpretability to decouple explanations from the model, enabling flexible use of powerful predictors while providing faithful, user-tailored explanations. It formalizes the LIME approach as a local surrogate model that approximates any black-box classifier around a given instance, guided by locality and simplicity constraints. The authors illuminate benefits such as representation flexibility, lower switching costs, and cross-model comparability, while acknowledging challenges in achieving global faithfulness and extending to all data modalities. Overall, model-agnostic explanations are presented as a practical path toward more trustworthy and usable AI systems, with LIME as a concrete, adaptable implementation.

Abstract

Understanding why machine learning models behave the way they do empowers both system designers and end-users in many ways: in model selection, feature engineering, in order to trust and act upon the predictions, and in more intuitive user interfaces. Thus, interpretability has become a vital concern in machine learning, and work in the area of interpretable models has found renewed interest. In some applications, such models are as accurate as non-interpretable ones, and thus are preferred for their transparency. Even when they are not accurate, they may still be preferred when interpretability is of paramount importance. However, restricting machine learning to interpretable models is often a severe limitation. In this paper we argue for explaining machine learning predictions using model-agnostic approaches. By treating the machine learning models as black-box functions, these approaches provide crucial flexibility in the choice of models, explanations, and representations, improving debugging, comparison, and interfaces for a variety of users and models. We also outline the main challenges for such methods, and review a recently-introduced model-agnostic explanation approach (LIME) that addresses these challenges.

Model-Agnostic Interpretability of Machine Learning

TL;DR

The paper argues for model-agnostic interpretability to decouple explanations from the model, enabling flexible use of powerful predictors while providing faithful, user-tailored explanations. It formalizes the LIME approach as a local surrogate model that approximates any black-box classifier around a given instance, guided by locality and simplicity constraints. The authors illuminate benefits such as representation flexibility, lower switching costs, and cross-model comparability, while acknowledging challenges in achieving global faithfulness and extending to all data modalities. Overall, model-agnostic explanations are presented as a practical path toward more trustworthy and usable AI systems, with LIME as a concrete, adaptable implementation.

Abstract

Understanding why machine learning models behave the way they do empowers both system designers and end-users in many ways: in model selection, feature engineering, in order to trust and act upon the predictions, and in more intuitive user interfaces. Thus, interpretability has become a vital concern in machine learning, and work in the area of interpretable models has found renewed interest. In some applications, such models are as accurate as non-interpretable ones, and thus are preferred for their transparency. Even when they are not accurate, they may still be preferred when interpretability is of paramount importance. However, restricting machine learning to interpretable models is often a severe limitation. In this paper we argue for explaining machine learning predictions using model-agnostic approaches. By treating the machine learning models as black-box functions, these approaches provide crucial flexibility in the choice of models, explanations, and representations, improving debugging, comparison, and interfaces for a variety of users and models. We also outline the main challenges for such methods, and review a recently-introduced model-agnostic explanation approach (LIME) that addresses these challenges.

Paper Structure

This paper contains 10 sections, 1 equation, 2 figures.

Figures (2)

  • Figure 1: Toy example to present intuition for LIME. The black-box model's complex decision function $f$ (unknown to LIME) is represented by the blue/pink background. The bright bold red cross is the instance being explained. LIME samples instances, gets predictions using $f$, and weighs them by the proximity to the instance being explained (represented here by size). The dashed line is the explanation that is locally (but not globally) faithful.
  • Figure 2: Explaining sentiment predictions for the sentence "This is not bad.", using different models and representations