Observation-specific explanations through scattered data approximation
Valentina Ghidini, Michael Multerer, Jacopo Quizi, Rohan Sen
TL;DR
This work reframes explainability by quantifying the influence of individual data points on a black-box predictor through observation-specific explanations. A surrogate model in a reproducing kernel Hilbert space is built via scattered data approximation and orthogonal matching pursuit to identify a small, informative subset of observations, from which normalized explanations $\gamma_i$ are derived. The surrogate provides provable reconstruction bounds: $f^*(x)$ closely approximates $f(x)$ on the sample set with $|f(x_i)-f^*(x_i)| \leq \varepsilon \|f\|_{\mathcal{H}}$, enabling per-point diagnostics. Empirical evaluations on synthetic (quadratic and Ackley) and a real possum dataset demonstrate high fidelity and reveal that influential observations tend to lie in boundary or sparsely populated regions, offering a data-centric lens on model behavior and potential insights into data representativeness and model fit.
Abstract
This work introduces the definition of observation-specific explanations to assign a score to each data point proportional to its importance in the definition of the prediction process. Such explanations involve the identification of the most influential observations for the black-box model of interest. The proposed method involves estimating these explanations by constructing a surrogate model through scattered data approximation utilizing the orthogonal matching pursuit algorithm. The proposed approach is validated on both simulated and real-world datasets.
