Table of Contents
Fetching ...

Manipulating and Measuring Model Interpretability

Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake M. Hofman, Jennifer Wortman Vaughan, Hanna Wallach

TL;DR

A sequence of pre-registered experiments showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box).

Abstract

With machine learning models being increasingly used to aid decision making even in high-stakes domains, there has been a growing interest in developing interpretable models. Although many supposedly interpretable models have been proposed, there have been relatively few experimental studies investigating whether these models achieve their intended effects, such as making people more closely follow a model's predictions when it is beneficial for them to do so or enabling them to detect when a model has made a mistake. We present a sequence of pre-registered experiments (N=3,800) in which we showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box). Predictably, participants who saw a clear model with few features could better simulate the model's predictions. However, we did not find that participants more closely followed its predictions. Furthermore, showing participants a clear model meant that they were less able to detect and correct for the model's sizable mistakes, seemingly due to information overload. These counterintuitive findings emphasize the importance of testing over intuition when developing interpretable models.

Manipulating and Measuring Model Interpretability

TL;DR

A sequence of pre-registered experiments showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box).

Abstract

With machine learning models being increasingly used to aid decision making even in high-stakes domains, there has been a growing interest in developing interpretable models. Although many supposedly interpretable models have been proposed, there have been relatively few experimental studies investigating whether these models achieve their intended effects, such as making people more closely follow a model's predictions when it is beneficial for them to do so or enabling them to detect when a model has made a mistake. We present a sequence of pre-registered experiments (N=3,800) in which we showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box). Predictably, participants who saw a clear model with few features could better simulate the model's predictions. However, we did not find that participants more closely followed its predictions. Furthermore, showing participants a clear model meant that they were less able to detect and correct for the model's sizable mistakes, seemingly due to information overload. These counterintuitive findings emphasize the importance of testing over intuition when developing interpretable models.

Paper Structure

This paper contains 28 sections, 21 figures, 18 tables.

Figures (21)

  • Figure 1: The four primary experimental conditions. In the conditions in the top row, the model used two features; in the conditions in the bottom row, the model used eight. In the conditions on the left, the model internals were clear; in the conditions on the right, the model internals were black box.
  • Figure 2: Part of the testing phase from our first experiment.
  • Figure 3: Results from our first experiment: density plots for participants' (a) mean simulation errors and (b) mean deviations from the model's predictions. Numbers in each subplot indicate average values over all participants in the corresponding condition, while error bars indicate one standard error.
  • Figure 5: Participants' mean predictions of apartment 12's selling price in (a) our first experiment and (b) our second experiment. Horizontal lines indicate the models' predictions and error bars indicate one standard error.
  • Figure 6: Density plots for participants' mean prediction errors in our first experiment (left) and in our second experiment (right). Numbers in each subplot indicate average values over all participants in the corresponding condition, while error bars indicate one standard error. Vertical lines indicate the model's mean prediction error.
  • ...and 16 more figures