Enabling Regional Explainability by Automatic and Model-agnostic Rule Extraction
Yu Chen, Tianyu Cui, Alexander Capstick, Nan Fletcher-Loyd, Payam Barnaghi
TL;DR
This paper tackles the challenge of explaining black-box models in regions with underrepresented data by introducing AMORE, a model-agnostic system for automatic regional rule extraction. AMORE combines feature-selection via integrated-gradients-based importance with FP-Growth to identify frequent feature sets, and employs a histogram-based, discretization-free approach to generate numeric feature intervals, enabling targeted rule construction for specific data subgroups. The framework defines clear evaluation criteria (Support, Confidence, Fitness) and provides mechanisms for local explanations at the sample level, showcasing improvements over a decision-tree baseline across diverse tasks, including diabetes, sepsis, molecular toxicity, MNIST, and brain-tumor MRI. Overall, AMORE advances regional explainability in imbalanced domains by delivering high-quality, interpretable, model-agnostic rules while controlling rule complexity and computational cost. The work demonstrates practical utility across tabular and non-tabular data and outlines avenues for extending to sequential rules and OR-combination of subspaces.
Abstract
In Explainable AI, rule extraction translates model knowledge into logical rules, such as IF-THEN statements, crucial for understanding patterns learned by black-box models. This could significantly aid in fields like disease diagnosis, disease progression estimation, or drug discovery. However, such application domains often contain imbalanced data, with the class of interest underrepresented. Existing methods inevitably compromise the performance of rules for the minor class to maximise the overall performance. As the first attempt in this field, we propose a model-agnostic approach for extracting rules from specific subgroups of data, featuring automatic rule generation for numerical features. This method enhances the regional explainability of machine learning models and offers wider applicability compared to existing methods. We additionally introduce a new method for selecting features to compose rules, reducing computational costs in high-dimensional spaces. Experiments across various datasets and models demonstrate the effectiveness of our methods.
