Table of Contents
Fetching ...

Case-based Explainability for Random Forest: Prototypes, Critics, Counter-factuals and Semi-factuals

Gregory Yampolsky, Dhruv Desai, Mingshu Li, Stefano Pasquali, Dhagash Mehta

TL;DR

This study addresses the challenge of explaining Random Forest predictions in regulated domains by integrating geometry- and accuracy-preserving RF-GAP proximities into Explainable Case-Based Reasoning (XCBR). It defines four explanantia types—prototypes, critics, semi-factuals, and counter-factuals—and provides concrete methods to identify them from RF proximities. Through experiments on toy datasets and a real-world funds dataset, the approach demonstrates that GAP proximities yield stronger explanatory power than traditional Euclidean metrics, across a broad suite of evaluation metrics (distance, sparsity, plausibility, confusability, diversity, robustness, and prototype-specific measures). The results, including MNIST visualizations and fund classifications, suggest that this RF-GAP–based XCBR framework offers practical, transparent explanations that can support regulatory needs and decision justification, with clear paths for extension to other tree ensembles and domains.

Abstract

The explainability of black-box machine learning algorithms, commonly known as Explainable Artificial Intelligence (XAI), has become crucial for financial and other regulated industrial applications due to regulatory requirements and the need for transparency in business practices. Among the various paradigms of XAI, Explainable Case-Based Reasoning (XCBR) stands out as a pragmatic approach that elucidates the output of a model by referencing actual examples from the data used to train or test the model. Despite its potential, XCBR has been relatively underexplored for many algorithms such as tree-based models until recently. We start by observing that most XCBR methods are defined based on the distance metric learned by the algorithm. By utilizing a recently proposed technique to extract the distance metric learned by Random Forests (RFs), which is both geometry- and accuracy-preserving, we investigate various XCBR methods. These methods amount to identify special points from the training datasets, such as prototypes, critics, counter-factuals, and semi-factuals, to explain the predictions for a given query of the RF. We evaluate these special points using various evaluation metrics to assess their explanatory power and effectiveness.

Case-based Explainability for Random Forest: Prototypes, Critics, Counter-factuals and Semi-factuals

TL;DR

This study addresses the challenge of explaining Random Forest predictions in regulated domains by integrating geometry- and accuracy-preserving RF-GAP proximities into Explainable Case-Based Reasoning (XCBR). It defines four explanantia types—prototypes, critics, semi-factuals, and counter-factuals—and provides concrete methods to identify them from RF proximities. Through experiments on toy datasets and a real-world funds dataset, the approach demonstrates that GAP proximities yield stronger explanatory power than traditional Euclidean metrics, across a broad suite of evaluation metrics (distance, sparsity, plausibility, confusability, diversity, robustness, and prototype-specific measures). The results, including MNIST visualizations and fund classifications, suggest that this RF-GAP–based XCBR framework offers practical, transparent explanations that can support regulatory needs and decision justification, with clear paths for extension to other tree ensembles and domains.

Abstract

The explainability of black-box machine learning algorithms, commonly known as Explainable Artificial Intelligence (XAI), has become crucial for financial and other regulated industrial applications due to regulatory requirements and the need for transparency in business practices. Among the various paradigms of XAI, Explainable Case-Based Reasoning (XCBR) stands out as a pragmatic approach that elucidates the output of a model by referencing actual examples from the data used to train or test the model. Despite its potential, XCBR has been relatively underexplored for many algorithms such as tree-based models until recently. We start by observing that most XCBR methods are defined based on the distance metric learned by the algorithm. By utilizing a recently proposed technique to extract the distance metric learned by Random Forests (RFs), which is both geometry- and accuracy-preserving, we investigate various XCBR methods. These methods amount to identify special points from the training datasets, such as prototypes, critics, counter-factuals, and semi-factuals, to explain the predictions for a given query of the RF. We evaluate these special points using various evaluation metrics to assess their explanatory power and effectiveness.
Paper Structure (36 sections, 6 equations, 2 figures, 5 tables)