Data-centric Prediction Explanation via Kernelized Stein Discrepancy
Mahtab Sarvmaili, Hassan Sajjad, Ga Wu
TL;DR
This paper addresses the need for fine-grained, efficient prediction explanations by introducing HD-Explain, a data-centric, post-hoc method that uses Kernelized Stein Discrepancy to define a model-dependent kernel encoding data correlations. By relaxing the marginal input distribution to the training data distribution and employing the score function derived from the model, it retrieves top-k training samples that provide the strongest predictive support for a test point without perturbing the model. The approach, metrics (Hit Rate, Coverage, Run Time), and experiments across CIFAR-10, SVHN, and medical imaging datasets demonstrate improved precision, consistency, and scalability over existing methods like Influence Function, RPS, and TracIn, with insights into kernel choices (Linear, RBF, IMQ) and last-layer variants. The work advances transparency in ML systems by offering a faithful, instance-level explanation mechanism that integrates model-aware data correlations into the explanation process, potentially aiding trust and debugging in high-stakes settings.
Abstract
Existing example-based prediction explanation methods often bridge test and training data points through the model's parameters or latent representations. While these methods offer clues to the causes of model predictions, they often exhibit innate shortcomings, such as incurring significant computational overhead or producing coarse-grained explanations. This paper presents a Highly-precise and Data-centric Explan}ation (HD-Explain) prediction explanation method that exploits properties of Kernelized Stein Discrepancy (KSD). Specifically, the KSD uniquely defines a parameterized kernel function for a trained model that encodes model-dependent data correlation. By leveraging the kernel function, one can identify training samples that provide the best predictive support to a test point efficiently. We conducted thorough analyses and experiments across multiple classification domains, where we show that HD-Explain outperforms existing methods from various aspects, including 1) preciseness (fine-grained explanation), 2) consistency, and 3) computation efficiency, leading to a surprisingly simple, effective, and robust prediction explanation solution.
