Unified Explanations in Machine Learning Models: A Perturbation Approach
Jacob Dineen, Don Kridel, Daniel Dolk, David Castillo
TL;DR
The paper tackles the explainability gap in complex ML models by introducing a perturbation-based framework that assesses how explanations behave under dynamic input changes. It defines two relative feature importance measures, Absolute Normalized Shap and Absolute Normalized Weighted Average (ANWA), and uses Cosine and Jaccard similarities to quantify static-dynamic alignment. A dynamic perturbation algorithm is proposed to simulate out-of-sample perturbations, enabling a systematic comparison with SHAP explanations across multiple datasets and models. Findings show generally strong alignment on simple, low-dimensional datasets but notable divergence on high-dimensional or heterogeneous feature spaces, underscoring the need for harmonized explanations in production. The approach offers a lightweight, model-agnostic means to validate explanations and generate reference narratives for transparent, trustworthy decision-making, with potential extensions to finance and non-tabular data.
Abstract
A high-velocity paradigm shift towards Explainable Artificial Intelligence (XAI) has emerged in recent years. Highly complex Machine Learning (ML) models have flourished in many tasks of intelligence, and the questions have started to shift away from traditional metrics of validity towards something deeper: What is this model telling me about my data, and how is it arriving at these conclusions? Inconsistencies between XAI and modeling techniques can have the undesirable effect of casting doubt upon the efficacy of these explainability approaches. To address these problems, we propose a systematic, perturbation-based analysis against a popular, model-agnostic method in XAI, SHapley Additive exPlanations (Shap). We devise algorithms to generate relative feature importance in settings of dynamic inference amongst a suite of popular machine learning and deep learning methods, and metrics that allow us to quantify how well explanations generated under the static case hold. We propose a taxonomy for feature importance methodology, measure alignment, and observe quantifiable similarity amongst explanation models across several datasets.
