Table of Contents
Fetching ...

Fast Explanations via Policy Gradient-Optimized Explainer

Deng Pan, Nuno Moniz, Nitesh Chawla

TL;DR

This work tackles the efficiency gap in attribution methods by proposing Fast Explanation (FEX), a policy-gradient-based explainer that treats attributions as a learnable distribution over feature masks. By replacing intractable empirical attributions with a tractable Bernoulli surrogate and optimizing via PPO with KL regularization, FEX achieves real-time explanations for black-box models with broad applicability. Empirical results across image and text tasks show substantial speedups (over 97% inference-time reduction) and memory savings (≈70%), while maintaining high-quality explanations and generalizability. The approach avoids reliance on proxy explanations and demonstrates strong performance even under varied trajectory lengths, data scales, and model architectures, highlighting its practical potential for scalable XAI in real-world systems.

Abstract

The challenge of delivering efficient explanations is a critical barrier that prevents the adoption of model explanations in real-world applications. Existing approaches often depend on extensive model queries for sample-level explanations or rely on expert's knowledge of specific model structures that trade general applicability for efficiency. To address these limitations, this paper introduces a novel framework Fast Explanation (FEX) that represents attribution-based explanations via probability distributions, which are optimized by leveraging the policy gradient method. The proposed framework offers a robust, scalable solution for real-time, large-scale model explanations, bridging the gap between efficiency and applicability. We validate our framework on image and text classification tasks and the experiments demonstrate that our method reduces inference time by over 97% and memory usage by 70% compared to traditional model-agnostic approaches while maintaining high-quality explanations and broad applicability.

Fast Explanations via Policy Gradient-Optimized Explainer

TL;DR

This work tackles the efficiency gap in attribution methods by proposing Fast Explanation (FEX), a policy-gradient-based explainer that treats attributions as a learnable distribution over feature masks. By replacing intractable empirical attributions with a tractable Bernoulli surrogate and optimizing via PPO with KL regularization, FEX achieves real-time explanations for black-box models with broad applicability. Empirical results across image and text tasks show substantial speedups (over 97% inference-time reduction) and memory savings (≈70%), while maintaining high-quality explanations and generalizability. The approach avoids reliance on proxy explanations and demonstrates strong performance even under varied trajectory lengths, data scales, and model architectures, highlighting its practical potential for scalable XAI in real-world systems.

Abstract

The challenge of delivering efficient explanations is a critical barrier that prevents the adoption of model explanations in real-world applications. Existing approaches often depend on extensive model queries for sample-level explanations or rely on expert's knowledge of specific model structures that trade general applicability for efficiency. To address these limitations, this paper introduces a novel framework Fast Explanation (FEX) that represents attribution-based explanations via probability distributions, which are optimized by leveraging the policy gradient method. The proposed framework offers a robust, scalable solution for real-time, large-scale model explanations, bridging the gap between efficiency and applicability. We validate our framework on image and text classification tasks and the experiments demonstrate that our method reduces inference time by over 97% and memory usage by 70% compared to traditional model-agnostic approaches while maintaining high-quality explanations and broad applicability.
Paper Structure (29 sections, 21 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 29 sections, 21 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: An illustration of the proposed method. The process begins with empirical attribution, calculated by summing over $2^N$ terms. To address the computational intractability of this summation, the attribution is reformulated as an expectation over a probability distribution $p$. Subsequently, $p$ is approximated by a Bernoulli distribution $q$, enabling a closed-form solution that depends solely on the parameters of $q$. Finally, the parameters of $q$ are optimized using the policy gradient method, yielding an approximation of the empirical attribution.
  • Figure 2: Qualitative examples for explaining the predictions in the image classification task.
  • Figure 3: Quantitative evaluation results for the text classification task. The x-axis represents the number of text tokens inserted starting from the most important token, and the y-axis is the F1 score given that amount of tokens. The higher the better.
  • Figure 4: The top two predictions for this image are "golden retriever" and "Siamese cat". When $\lambda_{kl}=0$, the explainer cannot differentiate these two classes. While when the KL regularization is introduced, it gains the ability to generalize over different classes.

Theorems & Definitions (1)

  • Definition 1