Table of Contents
Fetching ...

SUBPLEX: Towards a Better Understanding of Black Box Model Explanations at the Subpopulation Level

Jun Yuan, Gromit Yeuk-Yin Chan, Brian Barr, Kyle Overton, Kim Rees, Luis Gustavo Nonato, Enrico Bertini, Claudio T. Silva

TL;DR

SUBPLEX addresses the problem of interpreting black-box models at the subpopulation level by proposing steerable subpopulation analysis of local explanations and embedding an interactive visual-analytics pipeline in Jupyter notebooks. It introduces a user-guided distance metric and a feature-scoring scheme to refine clusters, enabling analysts to explore, compare, and export subpopulation-specific explanation patterns. The work is grounded in hierarchical task analysis, detailing design requirements and a three-stage pipeline (Generation, Exploration, Interpretation) supported by five linked views. Through loan-application and tweet-sentiment case studies and expert feedback, SUBPLEX demonstrates improved sensemaking and workflow integration for model interpretability in real-world settings.

Abstract

Understanding the interpretation of machine learning (ML) models has been of paramount importance when making decisions with societal impacts such as transport control, financial activities, and medical diagnosis. While current model interpretation methodologies focus on using locally linear functions to approximate the models or creating self-explanatory models that give explanations to each input instance, they do not focus on model interpretation at the subpopulation level, which is the understanding of model interpretations across different subset aggregations in a dataset. To address the challenges of providing explanations of an ML model across the whole dataset, we propose SUBPLEX, a visual analytics system to help users understand black-box model explanations with subpopulation visual analysis. SUBPLEX is designed through an iterative design process with machine learning researchers to address three usage scenarios of real-life machine learning tasks: model debugging, feature selection, and bias detection. The system applies novel subpopulation analysis on ML model explanations and interactive visualization to explore the explanations on a dataset with different levels of granularity. Based on the system, we conduct user evaluation to assess how understanding the interpretation at a subpopulation level influences the sense-making process of interpreting ML models from a user's perspective. Our results suggest that by providing model explanations for different groups of data, SUBPLEX encourages users to generate more ingenious ideas to enrich the interpretations. It also helps users to acquire a tight integration between programming workflow and visual analytics workflow. Last but not least, we summarize the considerations observed in applying visualization to machine learning interpretations.

SUBPLEX: Towards a Better Understanding of Black Box Model Explanations at the Subpopulation Level

TL;DR

SUBPLEX addresses the problem of interpreting black-box models at the subpopulation level by proposing steerable subpopulation analysis of local explanations and embedding an interactive visual-analytics pipeline in Jupyter notebooks. It introduces a user-guided distance metric and a feature-scoring scheme to refine clusters, enabling analysts to explore, compare, and export subpopulation-specific explanation patterns. The work is grounded in hierarchical task analysis, detailing design requirements and a three-stage pipeline (Generation, Exploration, Interpretation) supported by five linked views. Through loan-application and tweet-sentiment case studies and expert feedback, SUBPLEX demonstrates improved sensemaking and workflow integration for model interpretability in real-world settings.

Abstract

Understanding the interpretation of machine learning (ML) models has been of paramount importance when making decisions with societal impacts such as transport control, financial activities, and medical diagnosis. While current model interpretation methodologies focus on using locally linear functions to approximate the models or creating self-explanatory models that give explanations to each input instance, they do not focus on model interpretation at the subpopulation level, which is the understanding of model interpretations across different subset aggregations in a dataset. To address the challenges of providing explanations of an ML model across the whole dataset, we propose SUBPLEX, a visual analytics system to help users understand black-box model explanations with subpopulation visual analysis. SUBPLEX is designed through an iterative design process with machine learning researchers to address three usage scenarios of real-life machine learning tasks: model debugging, feature selection, and bias detection. The system applies novel subpopulation analysis on ML model explanations and interactive visualization to explore the explanations on a dataset with different levels of granularity. Based on the system, we conduct user evaluation to assess how understanding the interpretation at a subpopulation level influences the sense-making process of interpreting ML models from a user's perspective. Our results suggest that by providing model explanations for different groups of data, SUBPLEX encourages users to generate more ingenious ideas to enrich the interpretations. It also helps users to acquire a tight integration between programming workflow and visual analytics workflow. Last but not least, we summarize the considerations observed in applying visualization to machine learning interpretations.

Paper Structure

This paper contains 23 sections, 6 equations, 9 figures.

Figures (9)

  • Figure 1: An illustration of refining LIME subpopulation results with a synthetic dataset. The ground truth of the classifier from SHAP contains three decision boundaries. (a) The original subpopulation result of the explanations does not reflect the ground truth. (b) When selecting the features related to proline which are the main rationale of the model, the subpopulation result reflects three groups and becomes consistent with the ground truth.
  • Figure 2: Hierarchical Task Abstraction (HTA) of local explanation analysis using box-and-line notation. We follow the standard conventions for hierarchical task analysis kurniawan2004interaction where tasks are represented by named boxes with a unique ID, which also indicates the hierarchical level of the task. Task abstraction based on lam2017bridging are highlighted in orange. The related design requirements are denoted with each abstract task.
  • Figure 3: An overview of the human-in-the-loop pipeline for local explanation analysis in the coding environment.
  • Figure 4: SUBPLEX contains five linked views: (a) code block, (b) cluster refinement view, (c) projection view, (d) subpopulation creation panel, (e) local explanation detail view.
  • Figure 5: Three methods of selecting/creating a subpopulation for inspection.
  • ...and 4 more figures