Fast Explanations via Policy Gradient-Optimized Explainer
Deng Pan, Nuno Moniz, Nitesh Chawla
TL;DR
This work tackles the efficiency gap in attribution methods by proposing Fast Explanation (FEX), a policy-gradient-based explainer that treats attributions as a learnable distribution over feature masks. By replacing intractable empirical attributions with a tractable Bernoulli surrogate and optimizing via PPO with KL regularization, FEX achieves real-time explanations for black-box models with broad applicability. Empirical results across image and text tasks show substantial speedups (over 97% inference-time reduction) and memory savings (≈70%), while maintaining high-quality explanations and generalizability. The approach avoids reliance on proxy explanations and demonstrates strong performance even under varied trajectory lengths, data scales, and model architectures, highlighting its practical potential for scalable XAI in real-world systems.
Abstract
The challenge of delivering efficient explanations is a critical barrier that prevents the adoption of model explanations in real-world applications. Existing approaches often depend on extensive model queries for sample-level explanations or rely on expert's knowledge of specific model structures that trade general applicability for efficiency. To address these limitations, this paper introduces a novel framework Fast Explanation (FEX) that represents attribution-based explanations via probability distributions, which are optimized by leveraging the policy gradient method. The proposed framework offers a robust, scalable solution for real-time, large-scale model explanations, bridging the gap between efficiency and applicability. We validate our framework on image and text classification tasks and the experiments demonstrate that our method reduces inference time by over 97% and memory usage by 70% compared to traditional model-agnostic approaches while maintaining high-quality explanations and broad applicability.
