Class-Dependent Perturbation Effects in Evaluating Time Series Attributions
Gregor Baer, Isel Grau, Chao Zhang, Pieter Van Gorp
TL;DR
The paper investigates class-dependent effects in perturbation-based evaluation of time series feature attributions and uncovers that perturbation effectiveness can vary across predicted classes. It introduces the class-adjusted degradation score DS_c with a cross-class penalty Delta, defined via $DS_c(alpha) = DS_bar - alpha * Delta$, to balance aggregate attribution quality with cross-class consistency. Through experiments on four UCR datasets using two architectures and five attribution methods, it demonstrates that high aggregate performance often correlates with pronounced class-specific effects, and that optimal perturbation strategies vary by dataset and model. The work provides practical guidance for evaluation, including class-stratified analyses and the DS_c penalty, and outlines future directions toward class-aware perturbation strategies and adaptive evaluation procedures.
Abstract
As machine learning models become increasingly prevalent in time series applications, Explainable Artificial Intelligence (XAI) methods are essential for understanding their predictions. Within XAI, feature attribution methods aim to identify which input features contribute the most to a model's prediction, with their evaluation typically relying on perturbation-based metrics. Through systematic empirical analysis across multiple datasets, model architectures, and perturbation strategies, we reveal previously overlooked class-dependent effects in these metrics: they show varying effectiveness across classes, achieving strong results for some while remaining less sensitive to others. In particular, we find that the most effective perturbation strategies often demonstrate the most pronounced class differences. Our analysis suggests that these effects arise from the learned biases of classifiers, indicating that perturbation-based evaluation may reflect specific model behaviors rather than intrinsic attribution quality. We propose an evaluation framework with a class-aware penalty term to help assess and account for these effects in evaluating feature attributions, offering particular value for class-imbalanced datasets. Although our analysis focuses on time series classification, these class-dependent effects likely extend to other structured data domains where perturbation-based evaluation is common.
