Table of Contents
Fetching ...

ReX: A Framework for Incorporating Temporal Information in Model-Agnostic Local Explanation Techniques

Junhao Liu, Xin Zhang

TL;DR

ReX addresses the inadequacy of local, model-agnostic explanations for models processing variable-length sequences by injecting temporal information into the explanation process. It does so by augmenting the predicate language with 1D and 2D temporal predicates and by extending the perturbation sampling to generate variable-length inputs via t_per^R, enabling temporally faithful explanations without altering core algorithms. The framework is instantiated on Anchors, LIME, and Kernel SHAP, and evaluated across sentiment analysis, anomaly detection, and text generation tasks, showing substantial fidelity improvements and positive user-study outcomes, with manageable runtime overhead. This approach broadens the applicability of interpretable explanations to RNNs and transformers, supporting more reliable and actionable model understanding in practice.

Abstract

Existing local model-agnostic explanation techniques are ineffective for machine learning models that consider inputs of variable lengths, as they do not consider temporal information embedded in these models. To address this limitation, we propose \textsc{ReX}, a general framework for incorporating temporal information in these techniques. Our key insight is that these techniques typically learn a model surrogate by sampling model inputs and outputs, and we can incorporate temporal information in a uniform way by only changing the sampling process and the surrogate features. We instantiate our approach on three popular explanation techniques: Anchors, LIME, and Kernel SHAP. To evaluate the effectiveness of \textsc{ReX}, we apply our approach to six models in three different tasks. Our evaluation results demonstrate that our approach 1) significantly improves the fidelity of explanations, making model-agnostic techniques outperform a state-of-the-art model-specific technique on its target model, and 2) helps end users better understand the models' behaviors.

ReX: A Framework for Incorporating Temporal Information in Model-Agnostic Local Explanation Techniques

TL;DR

ReX addresses the inadequacy of local, model-agnostic explanations for models processing variable-length sequences by injecting temporal information into the explanation process. It does so by augmenting the predicate language with 1D and 2D temporal predicates and by extending the perturbation sampling to generate variable-length inputs via t_per^R, enabling temporally faithful explanations without altering core algorithms. The framework is instantiated on Anchors, LIME, and Kernel SHAP, and evaluated across sentiment analysis, anomaly detection, and text generation tasks, showing substantial fidelity improvements and positive user-study outcomes, with manageable runtime overhead. This approach broadens the applicability of interpretable explanations to RNNs and transformers, supporting more reliable and actionable model understanding in practice.

Abstract

Existing local model-agnostic explanation techniques are ineffective for machine learning models that consider inputs of variable lengths, as they do not consider temporal information embedded in these models. To address this limitation, we propose \textsc{ReX}, a general framework for incorporating temporal information in these techniques. Our key insight is that these techniques typically learn a model surrogate by sampling model inputs and outputs, and we can incorporate temporal information in a uniform way by only changing the sampling process and the surrogate features. We instantiate our approach on three popular explanation techniques: Anchors, LIME, and Kernel SHAP. To evaluate the effectiveness of \textsc{ReX}, we apply our approach to six models in three different tasks. Our evaluation results demonstrate that our approach 1) significantly improves the fidelity of explanations, making model-agnostic techniques outperform a state-of-the-art model-specific technique on its target model, and 2) helps end users better understand the models' behaviors.
Paper Structure (33 sections, 12 equations, 7 figures, 9 tables, 2 algorithms)

This paper contains 33 sections, 12 equations, 7 figures, 9 tables, 2 algorithms.

Figures (7)

  • Figure 1: Example Anchors explanations (a) and REX-augmented Anchors explanations (b).
  • Figure 2: Example explanations generated by LIME (left) and ReX-augmented LIME (right).
  • Figure 3: Anchors and ReX-augmented Anchors (Denoted as Anchors*) explanations for an anomaly detection RNN. Anchors: if the values of $x_{414}$, $x_{417}$, $x_{416}$, $x_{415}$, $x_{418}$, $x_{413}$, $x_{419}$, and $x_{412}$ remain unchanged, $x_{428}$ will be classified as an anomaly. Anchors*: if the values of $x_{413}$ and $x_{425}$ remain unchanged, and there are at least 3 data points between them, $x_{428}$ will be classified as an anomaly.
  • Figure 4: The workflow of generating explanations by a local model-agnostic explanation technique.
  • Figure 5: Average accuracy and AUROC of explanations for the four sentiment analysis models under different settings. The explanations are generated by LIME and its three augmented versions, which are augmented by ReX, ReX without 1D-predicates, and ReX without 2D-predicates.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 1: 1-D Temporal Predicate
  • Definition 2: 2-D Temporal Predicate
  • Definition 3: Explanation with Temporal Information