Enabling Regional Explainability by Automatic and Model-agnostic Rule Extraction

Yu Chen; Tianyu Cui; Alexander Capstick; Nan Fletcher-Loyd; Payam Barnaghi

Enabling Regional Explainability by Automatic and Model-agnostic Rule Extraction

Yu Chen, Tianyu Cui, Alexander Capstick, Nan Fletcher-Loyd, Payam Barnaghi

TL;DR

This paper tackles the challenge of explaining black-box models in regions with underrepresented data by introducing AMORE, a model-agnostic system for automatic regional rule extraction. AMORE combines feature-selection via integrated-gradients-based importance with FP-Growth to identify frequent feature sets, and employs a histogram-based, discretization-free approach to generate numeric feature intervals, enabling targeted rule construction for specific data subgroups. The framework defines clear evaluation criteria (Support, Confidence, Fitness) and provides mechanisms for local explanations at the sample level, showcasing improvements over a decision-tree baseline across diverse tasks, including diabetes, sepsis, molecular toxicity, MNIST, and brain-tumor MRI. Overall, AMORE advances regional explainability in imbalanced domains by delivering high-quality, interpretable, model-agnostic rules while controlling rule complexity and computational cost. The work demonstrates practical utility across tabular and non-tabular data and outlines avenues for extending to sequential rules and OR-combination of subspaces.

Abstract

In Explainable AI, rule extraction translates model knowledge into logical rules, such as IF-THEN statements, crucial for understanding patterns learned by black-box models. This could significantly aid in fields like disease diagnosis, disease progression estimation, or drug discovery. However, such application domains often contain imbalanced data, with the class of interest underrepresented. Existing methods inevitably compromise the performance of rules for the minor class to maximise the overall performance. As the first attempt in this field, we propose a model-agnostic approach for extracting rules from specific subgroups of data, featuring automatic rule generation for numerical features. This method enhances the regional explainability of machine learning models and offers wider applicability compared to existing methods. We additionally introduce a new method for selecting features to compose rules, reducing computational costs in high-dimensional spaces. Experiments across various datasets and models demonstrate the effectiveness of our methods.

Enabling Regional Explainability by Automatic and Model-agnostic Rule Extraction

TL;DR

Abstract

Paper Structure (23 sections, 1 theorem, 10 equations, 9 figures, 4 tables, 5 algorithms)

This paper contains 23 sections, 1 theorem, 10 equations, 9 figures, 4 tables, 5 algorithms.

Introduction
Related Work
Methods
Feature selection for rule extraction
Preliminary
Feature importance measured by integrated gradients
Frequently important feature sets
AMORE
Generating rules for numerical features
Evaluation criteria
Results
Baseline method
Hyperparameters
Task 1 -- Diabetes prediction
Task 2 -- Sepsis prediction
...and 8 more sections

Key Result

Proposition 1

Given two exclusive intervals of a feature: $\mathbb{X}_{a}$, $\mathbb{X}_{b}$, where $\mathbb{X}_{a} \cap \mathbb{X}_{b} = \emptyset$, and their corresponding ratios $\mathbf{r}_{a}, \mathbf{r}_{b}$, if $0 \le \mathbf{r}_{a} < \mathbf{r}_{b}$, then the merged interval $\mathbb{X}_{m} = \mathbb{X}_{

Figures (9)

Figure 1: An example scenario of applying regional rule extraction for diabetes prediction. The data region of interest is the region that contains samples predicted as positive diabetes cases by the model. The results are from our experiments on a diabetes prediction task diabetes@kaggle. In the extracted rules, Hemoglobin A1c (HbA1c) is a measure of a person's average blood sugar level over the past 2-3 months.
Figure 1: The Sensitivity analysis of confidence lower bound $\iota$ (to be continued). We compare AMORE and DT classifiers while varying the minimum support under different confidence lower bounds. The confidence lower bound $\iota$ is set to [0.7, 0.8, 0.9] for all tasks. We can see that changing $\iota$ does not affect the diabetes and brain tumor tasks. For other tasks, the differences are not significant and AMORE still demonstrates better or equivalent performance compared to DT classifiers.
Figure 2: The workflow of our methods. \ref{['alg:find_freq_features']} is optional, depending on the dimensionality of the data.
Figure 2: The Sensitivity analysis of confidence lower bound $\iota$. (Continued)
Figure 3: Visualization of the threshold selection and feature interval searching in a multi-mode scenario.
...and 4 more figures

Theorems & Definitions (2)

Proposition 1
proof

Enabling Regional Explainability by Automatic and Model-agnostic Rule Extraction

TL;DR

Abstract

Enabling Regional Explainability by Automatic and Model-agnostic Rule Extraction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (2)