Table of Contents
Fetching ...

HIERVAR: A Hierarchical Feature Selection Method for Time Series Analysis

Alireza Keshavarzian, Shahrokh Valaee

TL;DR

This work tackles the challenge of excessive, redundant features produced by random representation methods for time-series classification. It introduces HIERVAR, a two-phase hierarchical feature selector that first applies E-ROCKET knee-point pruning and then ANOVA-based F-score filtering to obtain a compact, discriminative feature set. Empirically, HIERVAR reduces feature counts by over 94 percent while maintaining or improving accuracy across MiniROCKET and RASTER representations, and it substantially lowers runtime. The approach enhances interpretability and enables efficient deployment on IoT and other resource-constrained platforms, without sacrificing predictive performance.

Abstract

Time series classification stands as a pivotal and intricate challenge across various domains, including finance, healthcare, and industrial systems. In contemporary research, there has been a notable upsurge in exploring feature extraction through random sampling. Unlike deep convolutional networks, these methods sidestep elaborate training procedures, yet they often necessitate generating a surplus of features to comprehensively encapsulate time series nuances. Consequently, some features may lack relevance to labels or exhibit multi-collinearity with others. In this paper, we propose a novel hierarchical feature selection method aided by ANOVA variance analysis to address this challenge. Through meticulous experimentation, we demonstrate that our method substantially reduces features by over 94% while preserving accuracy -- a significant advancement in the field of time series analysis and feature selection.

HIERVAR: A Hierarchical Feature Selection Method for Time Series Analysis

TL;DR

This work tackles the challenge of excessive, redundant features produced by random representation methods for time-series classification. It introduces HIERVAR, a two-phase hierarchical feature selector that first applies E-ROCKET knee-point pruning and then ANOVA-based F-score filtering to obtain a compact, discriminative feature set. Empirically, HIERVAR reduces feature counts by over 94 percent while maintaining or improving accuracy across MiniROCKET and RASTER representations, and it substantially lowers runtime. The approach enhances interpretability and enables efficient deployment on IoT and other resource-constrained platforms, without sacrificing predictive performance.

Abstract

Time series classification stands as a pivotal and intricate challenge across various domains, including finance, healthcare, and industrial systems. In contemporary research, there has been a notable upsurge in exploring feature extraction through random sampling. Unlike deep convolutional networks, these methods sidestep elaborate training procedures, yet they often necessitate generating a surplus of features to comprehensively encapsulate time series nuances. Consequently, some features may lack relevance to labels or exhibit multi-collinearity with others. In this paper, we propose a novel hierarchical feature selection method aided by ANOVA variance analysis to address this challenge. Through meticulous experimentation, we demonstrate that our method substantially reduces features by over 94% while preserving accuracy -- a significant advancement in the field of time series analysis and feature selection.
Paper Structure (7 sections, 11 equations, 5 figures, 1 table)

This paper contains 7 sections, 11 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overall schema of random representation learning
  • Figure 2: Sorted F-Score of the FordB dataset UCRArchive.
  • Figure 3: Overall architecture of the proposed method
  • Figure 4: The trend of the number of features as the parameter $d$ varies.
  • Figure 5: Comparative performance analysis of MINIROCKET, E-ROCKET and HIERVAR. Segmentation is based on the number of features relative to MINIROCKET. Average accuracy is depicted within a discernible range.