Table of Contents
Fetching ...

MOUNTAINEER: Topology-Driven Visual Analytics for Comparing Local Explanations

Parikshit Solunke, Vitoria Guardieiro, Joao Rulff, Peter Xenopoulos, Gromit Yeuk-Yin Chan, Brian Barr, Luis Gustavo Nonato, Claudio Silva

TL;DR

A novel topology-driven visual analytics tool that allows ML practitioners to interactively analyze and compare representations by linking the topological graphs back to the original data distribution, model predictions, and feature attributions, and can be used to compare and understand ML models themselves.

Abstract

With the increasing use of black-box Machine Learning (ML) techniques in critical applications, there is a growing demand for methods that can provide transparency and accountability for model predictions. As a result, a large number of local explainability methods for black-box models have been developed and popularized. However, machine learning explanations are still hard to evaluate and compare due to the high dimensionality, heterogeneous representations, varying scales, and stochastic nature of some of these methods. Topological Data Analysis (TDA) can be an effective method in this domain since it can be used to transform attributions into uniform graph representations, providing a common ground for comparison across different explanation methods. We present a novel topology-driven visual analytics tool, Mountaineer, that allows ML practitioners to interactively analyze and compare these representations by linking the topological graphs back to the original data distribution, model predictions, and feature attributions. Mountaineer facilitates rapid and iterative exploration of ML explanations, enabling experts to gain deeper insights into the explanation techniques, understand the underlying data distributions, and thus reach well-founded conclusions about model behavior. Furthermore, we demonstrate the utility of Mountaineer through two case studies using real-world data. In the first, we show how Mountaineer enabled us to compare black-box ML explanations and discern regions of and causes of disagreements between different explanations. In the second, we demonstrate how the tool can be used to compare and understand ML models themselves. Finally, we conducted interviews with three industry experts to help us evaluate our work.

MOUNTAINEER: Topology-Driven Visual Analytics for Comparing Local Explanations

TL;DR

A novel topology-driven visual analytics tool that allows ML practitioners to interactively analyze and compare representations by linking the topological graphs back to the original data distribution, model predictions, and feature attributions, and can be used to compare and understand ML models themselves.

Abstract

With the increasing use of black-box Machine Learning (ML) techniques in critical applications, there is a growing demand for methods that can provide transparency and accountability for model predictions. As a result, a large number of local explainability methods for black-box models have been developed and popularized. However, machine learning explanations are still hard to evaluate and compare due to the high dimensionality, heterogeneous representations, varying scales, and stochastic nature of some of these methods. Topological Data Analysis (TDA) can be an effective method in this domain since it can be used to transform attributions into uniform graph representations, providing a common ground for comparison across different explanation methods. We present a novel topology-driven visual analytics tool, Mountaineer, that allows ML practitioners to interactively analyze and compare these representations by linking the topological graphs back to the original data distribution, model predictions, and feature attributions. Mountaineer facilitates rapid and iterative exploration of ML explanations, enabling experts to gain deeper insights into the explanation techniques, understand the underlying data distributions, and thus reach well-founded conclusions about model behavior. Furthermore, we demonstrate the utility of Mountaineer through two case studies using real-world data. In the first, we show how Mountaineer enabled us to compare black-box ML explanations and discern regions of and causes of disagreements between different explanations. In the second, we demonstrate how the tool can be used to compare and understand ML models themselves. Finally, we conducted interviews with three industry experts to help us evaluate our work.
Paper Structure (32 sections, 8 figures, 2 tables)

This paper contains 32 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: We use Mountaineer to compare black-box Machine Learning (ML) model explanations for the real-world HELOC datasethelocdata. The Projection View (A) shows the original data projected into two dimensions. The user can choose among three different projection algorithms. The Distance Matrix (C) summarizes the distance between the topology of the explanation methods. When the user selects a cell in the matrix, the Mapper Views #1 and #2 (B.1, B.2) update to show the corresponding graph representations. In those views, the user can select nodes that they want to investigate (B.1). Then, the other graph coloring is updated to show the density of the selected samples (B.2). The Data View (D) presents the distribution of the features for the selected observations (in purple) and all observations (in green), arranged in descending order of difference between the two. The Feature Attribution View (E) displays the importance values for each feature according to the selected methods in decreasing order of importance. We can infer from the feature importance view that for the selected regions, there is a significant disagreement on feature importance between the two selected explanation methods.
  • Figure 2: Mapper algorithm used to create an approximate Reeb graph. The input space is first divided into overlapping intervals based on lens function values. Then, the points within the intervals are clustered into nodes. Subsequently, edges are constructed between clusters that have common input points. Thus, the Mapper output is generated as a node-link graph forming a skeletal representation of the input space.
  • Figure 3: Mountaineer links the data, model predictions, chosen explanation results, and their topological graphs into an interactive visual framework.
  • Figure 4: Graph summarization eliminates visual clutter by reducing redundancy. Nodes A, B, C, D (original graph) form a connected component and have the same data. Thus, they are aggregated into a new node F (optimized graph).
  • Figure 5: Mapper View (column A) and distribution of the feature MSinceMostRecentInqexcl7days (column B) for Vanilla Gradient explanation method in Case Study 1. We select each "side" of the hole and analyze the feature distribution, concluding that one side corresponds to samples with low values for the feature and the other to high values.
  • ...and 3 more figures