Table of Contents
Fetching ...

PHEATPRUNER: Interpretable Data-centric Feature Selection for Multivariate Time Series Classification through Persistent Homology

Anh-Duy Pham, Olivier Basole Kashongwe, Martin Atzmueller, Tim Römer

TL;DR

PHeatPruner tackles the challenge of achieving high performance and interpretability in multivariate time series classification by unifying data-centric feature pruning via persistent homology with a sheaf‑theoretic explainability layer. The method transforms the correlation matrix of MTS data into a distance matrix and constructs a Vietoris–Rips complex to identify topology‑driven feature consistency, selecting an optimal threshold via median death times and pruning unconnected variables. It then augments the pruned data with explanatory vectors derived from a consistency filtration over a sheaf on the remaining simplicial complex, enabling interpretable predictions without sacrificing accuracy. Across the UAE MTS benchmark and a mastitis‑detection dataset, PHeatPruner preserves or improves accuracy for RF, CatBoost, XGBoost, and LightGBM, while reducing dimensionality by about 30% on average (up to 45% in some cases) and providing SHAP‑based insights into key features and their temporal interactions.

Abstract

Balancing performance and interpretability in multivariate time series classification is a significant challenge due to data complexity and high dimensionality. This paper introduces PHeatPruner, a method integrating persistent homology and sheaf theory to address these challenges. Persistent homology facilitates the pruning of up to 45% of the applied variables while maintaining or enhancing the accuracy of models such as Random Forest, CatBoost, XGBoost, and LightGBM, all without depending on posterior probabilities or supervised optimization algorithms. Concurrently, sheaf theory contributes explanatory vectors that provide deeper insights into the data's structural nuances. The approach was validated using the UEA Archive and a mastitis detection dataset for dairy cows. The results demonstrate that PHeatPruner effectively preserves model accuracy. Furthermore, our results highlight PHeatPruner's key features, i.e. simplifying complex data and offering actionable insights without increasing processing time or complexity. This method bridges the gap between complexity reduction and interpretability, suggesting promising applications in various fields.

PHEATPRUNER: Interpretable Data-centric Feature Selection for Multivariate Time Series Classification through Persistent Homology

TL;DR

PHeatPruner tackles the challenge of achieving high performance and interpretability in multivariate time series classification by unifying data-centric feature pruning via persistent homology with a sheaf‑theoretic explainability layer. The method transforms the correlation matrix of MTS data into a distance matrix and constructs a Vietoris–Rips complex to identify topology‑driven feature consistency, selecting an optimal threshold via median death times and pruning unconnected variables. It then augments the pruned data with explanatory vectors derived from a consistency filtration over a sheaf on the remaining simplicial complex, enabling interpretable predictions without sacrificing accuracy. Across the UAE MTS benchmark and a mastitis‑detection dataset, PHeatPruner preserves or improves accuracy for RF, CatBoost, XGBoost, and LightGBM, while reducing dimensionality by about 30% on average (up to 45% in some cases) and providing SHAP‑based insights into key features and their temporal interactions.

Abstract

Balancing performance and interpretability in multivariate time series classification is a significant challenge due to data complexity and high dimensionality. This paper introduces PHeatPruner, a method integrating persistent homology and sheaf theory to address these challenges. Persistent homology facilitates the pruning of up to 45% of the applied variables while maintaining or enhancing the accuracy of models such as Random Forest, CatBoost, XGBoost, and LightGBM, all without depending on posterior probabilities or supervised optimization algorithms. Concurrently, sheaf theory contributes explanatory vectors that provide deeper insights into the data's structural nuances. The approach was validated using the UEA Archive and a mastitis detection dataset for dairy cows. The results demonstrate that PHeatPruner effectively preserves model accuracy. Furthermore, our results highlight PHeatPruner's key features, i.e. simplifying complex data and offering actionable insights without increasing processing time or complexity. This method bridges the gap between complexity reduction and interpretability, suggesting promising applications in various fields.

Paper Structure

This paper contains 18 sections, 3 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of PHeatPruner.
  • Figure 2: Persistent Homology-Based Feature Selection and Sheafification for Mastitis Detection.
  • Figure 3: SHAP Value Analysis of Input Features on XGBoost Model Outcomes for the Mastitis Class, illustrating the SHAP values for various input features and their impact on the XGBoost model’s predictions for mastitis. Features are color-coded, with red indicating higher values and blue indicating lower values. The x-axis represents the magnitude of each feature's impact on the model's output, showing how changes in feature values influence the model's prediction for the mastitis class.

Theorems & Definitions (4)

  • definition thmcounterdefinition: Vietoris-Rips Complex
  • definition thmcounterdefinition: Abstract Simplicial Complex
  • definition thmcounterdefinition: Persistent Homology
  • definition thmcounterdefinition: Assignment and Global Section