Table of Contents
Fetching ...

Detach-ROCKET: Sequential feature selection for time series classification with random convolutional kernels

Gonzalo Uribarri, Federico Barone, Alessio Ansuini, Erik Fransén

TL;DR

Sequential Feature Detachment (SFD) is introduced to identify and prune non-essential features in ROCKET-based models, such as ROCKET, MiniRocket, and MultiRocket, and these pruned models are named Detach-ROCKET.

Abstract

Time Series Classification (TSC) is essential in fields like medicine, environmental science, and finance, enabling tasks such as disease diagnosis, anomaly detection, and stock price analysis. While machine learning models like Recurrent Neural Networks and InceptionTime are successful in numerous applications, they can face scalability issues due to computational requirements. Recently, ROCKET has emerged as an efficient alternative, achieving state-of-the-art performance and simplifying training by utilizing a large number of randomly generated features from the time series data. However, many of these features are redundant or non-informative, increasing computational load and compromising generalization. Here we introduce Sequential Feature Detachment (SFD) to identify and prune non-essential features in ROCKET-based models, such as ROCKET, MiniRocket, and MultiRocket. SFD estimates feature importance using model coefficients and can handle large feature sets without complex hyperparameter tuning. Testing on the UCR archive shows that SFD can produce models with better test accuracy using only 10\% of the original features. We named these pruned models Detach-ROCKET. We also present an end-to-end procedure for determining an optimal balance between the number of features and model accuracy. On the largest binary UCR dataset, Detach-ROCKET improves test accuracy by 0.6\% while reducing features by 98.9\%. By enabling a significant reduction in model size without sacrificing accuracy, our methodology improves computational efficiency and contributes to model interpretability. We believe that Detach-ROCKET will be a valuable tool for researchers and practitioners working with time series data, who can find a user-friendly implementation of the model at \url{https://github.com/gon-uri/detach_rocket}.

Detach-ROCKET: Sequential feature selection for time series classification with random convolutional kernels

TL;DR

Sequential Feature Detachment (SFD) is introduced to identify and prune non-essential features in ROCKET-based models, such as ROCKET, MiniRocket, and MultiRocket, and these pruned models are named Detach-ROCKET.

Abstract

Time Series Classification (TSC) is essential in fields like medicine, environmental science, and finance, enabling tasks such as disease diagnosis, anomaly detection, and stock price analysis. While machine learning models like Recurrent Neural Networks and InceptionTime are successful in numerous applications, they can face scalability issues due to computational requirements. Recently, ROCKET has emerged as an efficient alternative, achieving state-of-the-art performance and simplifying training by utilizing a large number of randomly generated features from the time series data. However, many of these features are redundant or non-informative, increasing computational load and compromising generalization. Here we introduce Sequential Feature Detachment (SFD) to identify and prune non-essential features in ROCKET-based models, such as ROCKET, MiniRocket, and MultiRocket. SFD estimates feature importance using model coefficients and can handle large feature sets without complex hyperparameter tuning. Testing on the UCR archive shows that SFD can produce models with better test accuracy using only 10\% of the original features. We named these pruned models Detach-ROCKET. We also present an end-to-end procedure for determining an optimal balance between the number of features and model accuracy. On the largest binary UCR dataset, Detach-ROCKET improves test accuracy by 0.6\% while reducing features by 98.9\%. By enabling a significant reduction in model size without sacrificing accuracy, our methodology improves computational efficiency and contributes to model interpretability. We believe that Detach-ROCKET will be a valuable tool for researchers and practitioners working with time series data, who can find a user-friendly implementation of the model at \url{https://github.com/gon-uri/detach_rocket}.
Paper Structure (27 sections, 11 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 27 sections, 11 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Performance comparison of pruned models using SFD relative to the full ROCKET model. The plot shows the number of pruned models with better, equal, and worse performance compared to the full model, plotted against the percentage of retained features. The solid lines represent the mean number over 25 iterations, while the shaded area indicates the standard deviation. The dashed vertical line marks $10\%$ of retained features.
  • Figure 2: Percentage change in test set accuracy (relative to the full ROCKET model) of pruned models using SFD as a function of retained features. Results are shown for the 42 studied datasets, each averaged over 25 realizations (1050 realizations in total). The solid line represents the mean, the dashed line represents the median, and the shaded area represents the interquartile range (25th to 75th percentile). For comparison, the figure also includes two alternative pruning strategies: random selection (in blue) and inverse-SFD (in green).
  • Figure 3: Accuracy (Acc) comparison between the pruned models with $10\%$ of retained features and the full ROCKET model, for both test and training sets. Each point depicts the average accuracy of one of the 42 datasets over 25 realizations. The color indicates the training size of the dataset, with the color range saturated at 800 instances for enhanced contrast.
  • Figure 4: Selection of optimal number of features using Detach-ROCKET. The grey curve represents the validation set accuracy achieved by the ridge classifier as a function of the percentage of retained features. Optimal values corresponding to $c=0.1$ and $c=10$ are highlighted in violet and orange, respectively. Dashed lines indicate the level sets for the optimized objective functions (see Equation \ref{['eq:tradeoff']}), showing that the function value changes perpendicularly to these lines.
  • Figure A1: Results of the two-sample Kolmogorov-Smirnov test as a function of the percentage of features retained. At each pruning step, for the 25 experimental realizations, we test whether the number of pruned models that are better than the full model differs from the number of pruned models that are worse than the full model (violet and orange curves in Figure \ref{['fig:fig1']}). When feature retention is between $35.85\%$ and $5.10\%$, both distributions are significantly different because there are more better than worse models. When the feature retention is less than $1.73\%$, the distributions differ because there are significantly more worse models.