Validation, Robustness, and Accuracy of Perturbation-Based Sensitivity Analysis Methods for Time-Series Deep Learning Models
Zhengguang Wang
TL;DR
This work addresses how perturbation-based sensitivity analyses perform for time-series deep learning models, particularly transformers. It benchmark multiple SA methods—Feature Ablation, Feature Occlusion, and Morris—across several DL architectures (TFT, TimesNet, Autoformer, DLinear, PatchTST) on a county-level COVID-19 dataset with eight age-group features, using Spearman correlation against ground-truth age-group cases. The study investigates whether SA outputs are consistent across methods and models and how well they align with ground truth, aiming to establish validation, robustness, and accuracy of perturbation-based interpretability in time-series settings. The findings are intended to inform best practices for interpretable analytics and support policymaker confidence when using SA-derived insights for decision-making.
Abstract
This work undertakes studies to evaluate Interpretability Methods for Time-Series Deep Learning. Sensitivity analysis assesses how input changes affect the output, constituting a key component of interpretation. Among the post-hoc interpretation methods such as back-propagation, perturbation, and approximation, my work will investigate perturbation-based sensitivity Analysis methods on modern Transformer models to benchmark their performances. Specifically, my work answers three research questions: 1) Do different sensitivity analysis (SA) methods yield comparable outputs and attribute importance rankings? 2) Using the same sensitivity analysis method, do different Deep Learning (DL) models impact the output of the sensitivity analysis? 3) How well do the results from sensitivity analysis methods align with the ground truth?
