Table of Contents
Fetching ...

Improving performance of heart rate time series classification by grouping subjects

Michael Beekhuizen, Arman Naseri, David Tax, Ivo van der Bilt, Marcel Reinders

TL;DR

Heart rate time series are noisier and provide less detail for activity classification than ECG/PPG, with substantial inter-subject variability. The study uses the BigIdeasLab_STEP dataset to evaluate window/stride effects, clustering of subjects, and the impact of handcrafted features on deep learning classifiers, along with misclassification analysis. Key findings show that larger window sizes and smaller strides improve accuracy, clustering subjects into similar groups reduces inter-subject variability, and incorporating handcrafted features with DL yields further gains, while misclassifications cluster around activity transitions. The results suggest a path toward semi-personalized HR-based activity recognition, though generalization is limited by dataset size and device heterogeneity, and future work should explore additional normalization strategies and richer subject metadata.

Abstract

Unlike the more commonly analyzed ECG or PPG data for activity classification, heart rate time series data is less detailed, often noisier and can contain missing data points. Using the BigIdeasLab_STEP dataset, which includes heart rate time series annotated with specific tasks performed by individuals, we sought to determine if general classification was achievable. Our analyses showed that the accuracy is sensitive to the choice of window/stride size. Moreover, we found variable classification performances between subjects due to differences in the physical structure of their hearts. Various techniques were used to minimize this variability. First of all, normalization proved to be a crucial step and significantly improved the performance. Secondly, grouping subjects and performing classification inside a group helped to improve performance and decrease inter-subject variability. Finally, we show that including handcrafted features as input to a deep learning (DL) network improves the classification performance further. Together, these findings indicate that heart rate time series can be utilized for classification tasks like predicting activity. However, normalization or grouping techniques need to be chosen carefully to minimize the issue of subject variability.

Improving performance of heart rate time series classification by grouping subjects

TL;DR

Heart rate time series are noisier and provide less detail for activity classification than ECG/PPG, with substantial inter-subject variability. The study uses the BigIdeasLab_STEP dataset to evaluate window/stride effects, clustering of subjects, and the impact of handcrafted features on deep learning classifiers, along with misclassification analysis. Key findings show that larger window sizes and smaller strides improve accuracy, clustering subjects into similar groups reduces inter-subject variability, and incorporating handcrafted features with DL yields further gains, while misclassifications cluster around activity transitions. The results suggest a path toward semi-personalized HR-based activity recognition, though generalization is limited by dataset size and device heterogeneity, and future work should explore additional normalization strategies and richer subject metadata.

Abstract

Unlike the more commonly analyzed ECG or PPG data for activity classification, heart rate time series data is less detailed, often noisier and can contain missing data points. Using the BigIdeasLab_STEP dataset, which includes heart rate time series annotated with specific tasks performed by individuals, we sought to determine if general classification was achievable. Our analyses showed that the accuracy is sensitive to the choice of window/stride size. Moreover, we found variable classification performances between subjects due to differences in the physical structure of their hearts. Various techniques were used to minimize this variability. First of all, normalization proved to be a crucial step and significantly improved the performance. Secondly, grouping subjects and performing classification inside a group helped to improve performance and decrease inter-subject variability. Finally, we show that including handcrafted features as input to a deep learning (DL) network improves the classification performance further. Together, these findings indicate that heart rate time series can be utilized for classification tasks like predicting activity. However, normalization or grouping techniques need to be chosen carefully to minimize the issue of subject variability.
Paper Structure (10 sections, 10 figures, 2 tables)

This paper contains 10 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Accuracy when training an SVM using (a): a random split (windows of the same subject, both in the train and test set) and (b): when windows of subjects are either in the training or in the test set (leave-subject-out validation procedure). This is inspected for different window and stride sizes. For random splitting, the accuracy increases as the window size increase and stride size decrease, whereas for the leave-subject-out procedure, the accuracy seems to converge to one point. The achieved accuracies are plotted on the y-axis and the stride sizes are on the x-axis. The different window sizes are represented by different coloured lines. 120 (green), 100 (light blue), 80 (dark blue) and 50 (orange).
  • Figure 2: A cluster assignment with the number of clusters equal to 6 using a time series of a subject's mean BPM per activity using the TimeSeriesKmeans clustering proceduretslearn. Subplots from left to right represent the six different clusters and the subjects included. Grey lines represent the individual time series and thus represent a single subject. Red lines are the averages of the time series in the cluster. The x-axis shows the different activities numbered from 0 to 4 and the y-axis shows the heart rate in BPM.
  • Figure 3: Results of accuracies within a cluster for the Feature and Data standardization methods when training an SVM with the leave-one-subject out validation procedure. Yellow and dark blue points represent the mean per cluster and horizontal lines represent the performance of the SVM when no clustering is performed. For yellow/orange points, balanced accuracy was used and for light/dark blue, unbalanced/normal accuracy. With Feature standardization (a), three of the four larger clusters have a mean accuracy higher than an SVM trained on all the data. With Data standardization (b), only two of the four larger clusters, have a mean accuracy higher than an SVM trained on all the data.
  • Figure 4: Results of accuracies within a cluster for the Feature standardization method using temporal and statistical features and training an SVM with the leave-one-subject out validation procedure. Yellow and dark blue points represent the mean per cluster and horizontal lines represent the performance of the SVM when no clustering is performed. For yellow/orange points, balanced accuracy was used and for light/dark blue, unbalanced/normal accuracy. With temporal features (a), three of the four larger clusters have a mean accuracy higher than an SVM trained on all the data. With statistical features (b), all four clusters have a mean accuracy higher than an SVM trained on all the data.
  • Figure 5: Confusion matrices for the per-window and the per-subject approach. The true/actual labels are shown on the vertical axis and the predicted labels are on the horizontal axis. The biggest difference can be seen in the predictions of the Rest and Activity class.
  • ...and 5 more figures