Table of Contents
Fetching ...

Improving the Evaluation and Actionability of Explanation Methods for Multivariate Time Series Classification

Davide Italo Serramazza, Thach Le Nguyen, Georgiana Ifrim

TL;DR

The paper tackles the challenge of evaluating and making explanations for MTSC actionable. It analyzes InterpretTime, identifies weaknesses such as reliance on data augmentation and single-mask perturbations, and proposes improvements including multiple masking and time-series chunking. Through ground-truth alignment and real-world datasets, it shows perturbation-based methods, particularly SHAP and Feature Ablation, offer strong explanatory power and that channel-level explanations can meaningfully guide MTSC channel selection with data reduction and improved accuracy. The work advances practical XAI for MTSC and provides a path toward more reliable, task-driven explanations with broader applicability beyond mere visualization.

Abstract

Explanation for Multivariate Time Series Classification (MTSC) is an important topic that is under explored. There are very few quantitative evaluation methodologies and even fewer examples of actionable explanation, where the explanation methods are shown to objectively improve specific computational tasks on time series data. In this paper we focus on analyzing InterpretTime, a recent evaluation methodology for attribution methods applied to MTSC. We showcase some significant weaknesses of the original methodology and propose ideas to improve both its accuracy and efficiency. Unlike related work, we go beyond evaluation and also showcase the actionability of the produced explainer ranking, by using the best attribution methods for the task of channel selection in MTSC. We find that perturbation-based methods such as SHAP and Feature Ablation work well across a set of datasets, classifiers and tasks and outperform gradient-based methods. We apply the best ranked explainers to channel selection for MTSC and show significant data size reduction and improved classifier accuracy.

Improving the Evaluation and Actionability of Explanation Methods for Multivariate Time Series Classification

TL;DR

The paper tackles the challenge of evaluating and making explanations for MTSC actionable. It analyzes InterpretTime, identifies weaknesses such as reliance on data augmentation and single-mask perturbations, and proposes improvements including multiple masking and time-series chunking. Through ground-truth alignment and real-world datasets, it shows perturbation-based methods, particularly SHAP and Feature Ablation, offer strong explanatory power and that channel-level explanations can meaningfully guide MTSC channel selection with data reduction and improved accuracy. The work advances practical XAI for MTSC and provides a path toward more reliable, task-driven explanations with broader applicability beyond mere visualization.

Abstract

Explanation for Multivariate Time Series Classification (MTSC) is an important topic that is under explored. There are very few quantitative evaluation methodologies and even fewer examples of actionable explanation, where the explanation methods are shown to objectively improve specific computational tasks on time series data. In this paper we focus on analyzing InterpretTime, a recent evaluation methodology for attribution methods applied to MTSC. We showcase some significant weaknesses of the original methodology and propose ideas to improve both its accuracy and efficiency. Unlike related work, we go beyond evaluation and also showcase the actionability of the produced explainer ranking, by using the best attribution methods for the task of channel selection in MTSC. We find that perturbation-based methods such as SHAP and Feature Ablation work well across a set of datasets, classifiers and tasks and outperform gradient-based methods. We apply the best ranked explainers to channel selection for MTSC and show significant data size reduction and improved classifier accuracy.
Paper Structure (24 sections, 3 equations, 9 figures, 11 tables)

This paper contains 24 sections, 3 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Example heat maps for two different input types: on the left, an image and the corresponding heat map (image from cam2015), on the right, a three-channel time series from the Counter Movement Jump (CMJ) dataset le2019interpretable: each of the 3 plots corresponds to one channel recording the $x,y,z$-axis acceleration (image from serramazza2023evaluating). In both cases, darker red means the more important the corresponding item is for the classification.
  • Figure 2: Same instance from the Military Press (MP) dataset composed of 8 different channels recording $y$ positions of both Left and Right Wrists, Elbows, Shoulders and Hips. This figure visually shows a very frequent problem: two different attribution methods producing very different heat maps for the same instance (image from serramazza2023evaluating).
  • Figure 3: Global Gaussian
  • Figure 4: Local Gaussian
  • Figure 5: Normal distribution
  • ...and 4 more figures