On the Generalizability of ECG-based Stress Detection Models
Pooja Prajod, Elisabeth André
TL;DR
This study evaluates how well ECG-based stress detection generalizes across different datasets by comparing two deep learning approaches with three HRV-feature-based classifiers on WESAD and SWELL-KW. It uses LOSO cross-validation within datasets and cross-dataset transfer to assess generalization, and demonstrates that HRV-feature models outperform deep learning when transferring to a dataset with different stressors and sensors, while deep learning remains superior within the same dataset. The findings imply that HRV-based approaches are more suitable for deployment in scenarios different from the training data, whereas deep ECG methods are advantageous when the input characteristics match the training distribution. The work highlights the need for more diverse datasets and controlled cross-device analyses to improve generalizability of ECG-based stress recognition systems.
Abstract
Stress is prevalent in many aspects of everyday life including work, healthcare, and social interactions. Many works have studied handcrafted features from various bio-signals that are indicators of stress. Recently, deep learning models have also been proposed to detect stress. Typically, stress models are trained and validated on the same dataset, often involving one stressful scenario. However, it is not practical to collect stress data for every scenario. So, it is crucial to study the generalizability of these models and determine to what extent they can be used in other scenarios. In this paper, we explore the generalization capabilities of Electrocardiogram (ECG)-based deep learning models and models based on handcrafted ECG features, i.e., Heart Rate Variability (HRV) features. To this end, we train three HRV models and two deep learning models that use ECG signals as input. We use ECG signals from two popular stress datasets - WESAD and SWELL-KW - differing in terms of stressors and recording devices. First, we evaluate the models using leave-one-subject-out (LOSO) cross-validation using training and validation samples from the same dataset. Next, we perform a cross-dataset validation of the models, that is, LOSO models trained on the WESAD dataset are validated using SWELL-KW samples and vice versa. While deep learning models achieve the best results on the same dataset, models based on HRV features considerably outperform them on data from a different dataset. This trend is observed for all the models on both datasets. Therefore, HRV models are a better choice for stress recognition in applications that are different from the dataset scenario. To the best of our knowledge, this is the first work to compare the cross-dataset generalizability between ECG-based deep learning models and HRV models.
