ReX: A Framework for Incorporating Temporal Information in Model-Agnostic Local Explanation Techniques
Junhao Liu, Xin Zhang
TL;DR
ReX addresses the inadequacy of local, model-agnostic explanations for models processing variable-length sequences by injecting temporal information into the explanation process. It does so by augmenting the predicate language with 1D and 2D temporal predicates and by extending the perturbation sampling to generate variable-length inputs via t_per^R, enabling temporally faithful explanations without altering core algorithms. The framework is instantiated on Anchors, LIME, and Kernel SHAP, and evaluated across sentiment analysis, anomaly detection, and text generation tasks, showing substantial fidelity improvements and positive user-study outcomes, with manageable runtime overhead. This approach broadens the applicability of interpretable explanations to RNNs and transformers, supporting more reliable and actionable model understanding in practice.
Abstract
Existing local model-agnostic explanation techniques are ineffective for machine learning models that consider inputs of variable lengths, as they do not consider temporal information embedded in these models. To address this limitation, we propose \textsc{ReX}, a general framework for incorporating temporal information in these techniques. Our key insight is that these techniques typically learn a model surrogate by sampling model inputs and outputs, and we can incorporate temporal information in a uniform way by only changing the sampling process and the surrogate features. We instantiate our approach on three popular explanation techniques: Anchors, LIME, and Kernel SHAP. To evaluate the effectiveness of \textsc{ReX}, we apply our approach to six models in three different tasks. Our evaluation results demonstrate that our approach 1) significantly improves the fidelity of explanations, making model-agnostic techniques outperform a state-of-the-art model-specific technique on its target model, and 2) helps end users better understand the models' behaviors.
