CAFO: Feature-Centric Explanation on Time Series Classification

Jaeho Kim; Seok-Ju Hahn; Yoontae Hwang; Junghye Lee; Seulki Lee

CAFO: Feature-Centric Explanation on Time Series Classification

Jaeho Kim, Seok-Ju Hahn, Yoontae Hwang, Junghye Lee, Seulki Lee

TL;DR

CAFO addresses the need for feature-centric explanations in multivariate time series by introducing a channel-attention based explainer with depthwise processing (DepCA) and QR-Ortho regularization to promote feature separability. By converting time series into image-like representations, CAFO obtains global and class-wise feature importance (GI and CWRI) and evaluates them with ROAR-inspired metrics and ABC, yielding more stable and interpretable feature rankings than traditional time-centric methods. The framework is validated on synthetic and real-world datasets, demonstrating improved feature identification accuracy, especially for class-specific features, and alignment with domain knowledge. These findings suggest CAFO as a practical foundation for robust feature-centric explainability in MTS and a path for sensor selection and cost-efficient design in industrial settings.

Abstract

In multivariate time series (MTS) classification, finding the important features (e.g., sensors) for model performance is crucial yet challenging due to the complex, high-dimensional nature of MTS data, intricate temporal dynamics, and the necessity for domain-specific interpretations. Current explanation methods for MTS mostly focus on time-centric explanations, apt for pinpointing important time periods but less effective in identifying key features. This limitation underscores the pressing need for a feature-centric approach, a vital yet often overlooked perspective that complements time-centric analysis. To bridge this gap, our study introduces a novel feature-centric explanation and evaluation framework for MTS, named CAFO (Channel Attention and Feature Orthgonalization). CAFO employs a convolution-based approach with channel attention mechanisms, incorporating a depth-wise separable channel attention module (DepCA) and a QR decomposition-based loss for promoting feature-wise orthogonality. We demonstrate that this orthogonalization enhances the separability of attention distributions, thereby refining and stabilizing the ranking of feature importance. This improvement in feature-wise ranking enhances our understanding of feature explainability in MTS. Furthermore, we develop metrics to evaluate global and class-specific feature importance. Our framework's efficacy is validated through extensive empirical analyses on two major public benchmarks and real-world datasets, both synthetic and self-collected, specifically designed to highlight class-wise discriminative features. The results confirm CAFO's robustness and informative capacity in assessing feature importance in MTS classification tasks. This study not only advances the understanding of feature-centric explanations in MTS but also sets a foundation for future explorations in feature-centric explanations.

CAFO: Feature-Centric Explanation on Time Series Classification

TL;DR

Abstract

Paper Structure (50 sections, 14 equations, 20 figures, 6 tables)

This paper contains 50 sections, 14 equations, 20 figures, 6 tables.

Introduction
Preliminaries and Related Works
Preliminaries
Image Encoding of MTS
Channel Attention (CA) Modules
Multivariate Time Series Explanation
CAFO: Channel Attention and Feature Orthogonalization
Depthwise Channel Attention (DepCA)
Enhancing Feature Separability: QR-Ortho
Feature Explanation Measures
Dataset and Baseline
Results
Evaluation of Global Importance
Consistency in GI Ranks
Within Models
...and 35 more sections

Figures (20)

Figure 1: Overview of CAFO: (A) End-to-end training. Raw time series are converted into images using image encoding methods, followed by the extraction of channel-wise attention scores using the DepCA+QR Module. These attention scores are element-wise multiplied to image features for end-to-end model training. (B) DepCA assesses feature contributions, while QR-Ortho Loss minimizes feature redundancy through orthogonality regularization. (C) Feature Importance Calculation. The calculated attention scores are utilized to explain MTS data via Global Importance (GI) and Class-Wise Relative Importance (CWRI) metrics.
Figure 2: Visualization of the channel attention (CA) values using t-SNE van2008visualizing for CBAM woo2018cbam, SE hu2018squeeze, SIMAM yang2021simam, and the proposed DepCA module on the GILON dataset kim2023multi. The Calinski-Harabasz score calinski1974dendrite at the bottom indicates their clustering performance (the higher, the better). As observed, DepCA effectively captures sample and class-specific information even though the CA scores are computed in the early layer of the network, in contrast to existing methods woo2018cbamhu2018squeeze that compute the CA scores in latent channel spaces (middle layers of the network)
Figure 3: Orthogonal regularization on the feature-dimension of the attentions enhances separability. Using QR-Ortho loss, we demonstrate an enhanced distinction between previously overlapping attentions in the Gilon dataset kim2023multi, consistent across five-fold CV.
Figure 4: RemOve And Retrain (ROAR) with Gilon kim2023multi. The feature ranks of the Gilon task were first identified by our CAFO using the whole 14 feature set, with potential rank variations across figures. To assess the importance of each feature, we systematically removed them from the train and test datasets, ensuring consistency in distribution. This process involved the progressive subtraction of more important (red as 'Truth') and less important (blue as 'Inverse') features. After each removal, the model was retrained, and its accuracy was evaluated. The X-axis represents the number of features removed (with zero indicating no removal), while the Y-axis shows the model's accuracy. A notable decline in accuracy is observed with the removal of key features, in contrast to a minimal impact when less important features are omitted. The area between the curve (ABC) metric quantifies the gap between the two curves, where a higher ABC indicates superior feature-wise ranking. The first row exhibits the model's performance using cross-entropy (CE) alone, while the second row shows integration of QR-Ortho (our approach) with CE. A marked improvement in ABC scores across all models is evident, underscoring QR-Ortho's efficacy in identifying pivotal features.
Figure 5: (A) The GI (Global Importance) score for the MS Dataset morris2014recofit is provided. $\mathbf{x}_{0}$ to $\mathbf{x}_{5}$ denotes the feature index. (B) An example of WhichFinger's CWRI (Class-Wise Relative Importance) score: columns represent sensors (features), rows denote classes, and cell values convey CWRI scores. Red indicates the higher relative importance of the feature for the class, whereas blue denotes features of lesser importance in the context of the specific class.
...and 15 more figures

CAFO: Feature-Centric Explanation on Time Series Classification

TL;DR

Abstract

CAFO: Feature-Centric Explanation on Time Series Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (20)