PHEATPRUNER: Interpretable Data-centric Feature Selection for Multivariate Time Series Classification through Persistent Homology

Anh-Duy Pham; Olivier Basole Kashongwe; Martin Atzmueller; Tim Römer

PHEATPRUNER: Interpretable Data-centric Feature Selection for Multivariate Time Series Classification through Persistent Homology

Anh-Duy Pham, Olivier Basole Kashongwe, Martin Atzmueller, Tim Römer

TL;DR

PHeatPruner tackles the challenge of achieving high performance and interpretability in multivariate time series classification by unifying data-centric feature pruning via persistent homology with a sheaf‑theoretic explainability layer. The method transforms the correlation matrix of MTS data into a distance matrix and constructs a Vietoris–Rips complex to identify topology‑driven feature consistency, selecting an optimal threshold via median death times and pruning unconnected variables. It then augments the pruned data with explanatory vectors derived from a consistency filtration over a sheaf on the remaining simplicial complex, enabling interpretable predictions without sacrificing accuracy. Across the UAE MTS benchmark and a mastitis‑detection dataset, PHeatPruner preserves or improves accuracy for RF, CatBoost, XGBoost, and LightGBM, while reducing dimensionality by about 30% on average (up to 45% in some cases) and providing SHAP‑based insights into key features and their temporal interactions.

Abstract

Balancing performance and interpretability in multivariate time series classification is a significant challenge due to data complexity and high dimensionality. This paper introduces PHeatPruner, a method integrating persistent homology and sheaf theory to address these challenges. Persistent homology facilitates the pruning of up to 45% of the applied variables while maintaining or enhancing the accuracy of models such as Random Forest, CatBoost, XGBoost, and LightGBM, all without depending on posterior probabilities or supervised optimization algorithms. Concurrently, sheaf theory contributes explanatory vectors that provide deeper insights into the data's structural nuances. The approach was validated using the UEA Archive and a mastitis detection dataset for dairy cows. The results demonstrate that PHeatPruner effectively preserves model accuracy. Furthermore, our results highlight PHeatPruner's key features, i.e. simplifying complex data and offering actionable insights without increasing processing time or complexity. This method bridges the gap between complexity reduction and interpretability, suggesting promising applications in various fields.

PHEATPRUNER: Interpretable Data-centric Feature Selection for Multivariate Time Series Classification through Persistent Homology

TL;DR

Abstract

PHEATPRUNER: Interpretable Data-centric Feature Selection for Multivariate Time Series Classification through Persistent Homology

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)

Theorems & Definitions (4)