Unified Explanations in Machine Learning Models: A Perturbation Approach

Jacob Dineen; Don Kridel; Daniel Dolk; David Castillo

Unified Explanations in Machine Learning Models: A Perturbation Approach

Jacob Dineen, Don Kridel, Daniel Dolk, David Castillo

TL;DR

The paper tackles the explainability gap in complex ML models by introducing a perturbation-based framework that assesses how explanations behave under dynamic input changes. It defines two relative feature importance measures, Absolute Normalized Shap and Absolute Normalized Weighted Average (ANWA), and uses Cosine and Jaccard similarities to quantify static-dynamic alignment. A dynamic perturbation algorithm is proposed to simulate out-of-sample perturbations, enabling a systematic comparison with SHAP explanations across multiple datasets and models. Findings show generally strong alignment on simple, low-dimensional datasets but notable divergence on high-dimensional or heterogeneous feature spaces, underscoring the need for harmonized explanations in production. The approach offers a lightweight, model-agnostic means to validate explanations and generate reference narratives for transparent, trustworthy decision-making, with potential extensions to finance and non-tabular data.

Abstract

A high-velocity paradigm shift towards Explainable Artificial Intelligence (XAI) has emerged in recent years. Highly complex Machine Learning (ML) models have flourished in many tasks of intelligence, and the questions have started to shift away from traditional metrics of validity towards something deeper: What is this model telling me about my data, and how is it arriving at these conclusions? Inconsistencies between XAI and modeling techniques can have the undesirable effect of casting doubt upon the efficacy of these explainability approaches. To address these problems, we propose a systematic, perturbation-based analysis against a popular, model-agnostic method in XAI, SHapley Additive exPlanations (Shap). We devise algorithms to generate relative feature importance in settings of dynamic inference amongst a suite of popular machine learning and deep learning methods, and metrics that allow us to quantify how well explanations generated under the static case hold. We propose a taxonomy for feature importance methodology, measure alignment, and observe quantifiable similarity amongst explanation models across several datasets.

Unified Explanations in Machine Learning Models: A Perturbation Approach

TL;DR

Abstract

Paper Structure (24 sections, 9 equations, 4 figures, 7 tables, 2 algorithms)

This paper contains 24 sections, 9 equations, 4 figures, 7 tables, 2 algorithms.

Introduction
Background
Preliminaries
Related Work
Methodology
Absolute Normalized Shap
Absolute Normalized Weighted Average (ANWA)
Metric Comparison
Experiments
Datasets
Performance Metrics
Shap vs ANWA - A Drill Down
Systematic Comparison
Sample Size Analysis
Ranked Similarity
...and 9 more sections

Figures (4)

Figure 1: Shap can be used to generate local or global explanations about each label $y_i \in Y$. The global explanation for the $0^{th}$ class of the Iris Flower Dataset is shown above. The graph shows us that when looking at predictions, globally, made for the $0th$ class, our model is generally not using petal width, sepal width, or sepal length to draw a decision boundary. Petal length SRGs are more dispersed and are generally showing that a high value is more indicative of a lower $p(y_i = 0)$, and a lower value has a higher influence on mapping it to the $0^{th}$ class $p(y_i = 1)$.
Figure 2: Analysis conducted on the Iris (TOP) and Cancer (Bottom) datasets Dua:2019. Left: Absolute Normalized Shap Values for each feature. Right: Absolute Normalized Weighted Averages for each feature following our methodology for dynamic perturbation. We fix the arbitrary metric per EQ \ref{['eq: ANWA']} as $m =$ accuracy for simplification. The legend denotes abbreviations that follow the mapping: 'nn' : Multilayer Perceptron Neural Network, 'svm' : Support Vector Machine Classifier, 'logit' : Logistic Regression, 'rf' : Random Forest Classifier, 'knn' : K-Nearest Neighbors Classifier, 'gbc' : Gradient Boosted Trees Classifier.
Figure 3: Left: Boxplot displaying cosine similarity between Shap and ANWA by dataset. The colors map to specific predictive modeling techniques. Right: Boxplot displaying similarity between Shap and ANWA across all models, with the color mapping to the specific performance metric $m$ used.
Figure 4: Jaccard Similarity by Model, by Dataset. The similarity is measured by the intersection of two sets against their union. The two sets contain the top-k features for each method under static and dynamic XAI.

Unified Explanations in Machine Learning Models: A Perturbation Approach

TL;DR

Abstract

Unified Explanations in Machine Learning Models: A Perturbation Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (4)