Stressor Type Matters! -- Exploring Factors Influencing Cross-Dataset Generalizability of Physiological Stress Detection
Pooja Prajod, Bhargavi Mahesh, Elisabeth André
TL;DR
This study interrogates how HRV-based stress detection generalizes across datasets and identifies stressor type as the key factor driving cross-dataset performance. By evaluating RFC, SVM, and MLP on ECG- and BVP-derived HRV features from four public datasets, it shows that models transfer better when the elicited stressor is similar across datasets, while device differences and stress intensity within a moderate range have limited effects. Combining data can help or hurt generalizability depending on compatibility of the stressor type and sensors. The work provides actionable guidance for deploying HRV-based stress detectors in new environments and highlights avenues for extending generalizability analyses to additional modalities and stress types.
Abstract
Automatic stress detection using heart rate variability (HRV) features has gained significant traction as it utilizes unobtrusive wearable sensors measuring signals like electrocardiogram (ECG) or blood volume pulse (BVP). However, detecting stress through such physiological signals presents a considerable challenge owing to the variations in recorded signals influenced by factors, such as perceived stress intensity and measurement devices. Consequently, stress detection models developed on one dataset may perform poorly on unseen data collected under different conditions. To address this challenge, this study explores the generalizability of machine learning models trained on HRV features for binary stress detection. Our goal extends beyond evaluating generalization performance; we aim to identify the characteristics of datasets that have the most significant influence on generalizability. We leverage four publicly available stress datasets (WESAD, SWELL-KW, ForDigitStress, VerBIO) that vary in at least one of the characteristics such as stress elicitation techniques, stress intensity, and sensor devices. Employing a cross-dataset evaluation approach, we explore which of these characteristics strongly influence model generalizability. Our findings reveal a crucial factor affecting model generalizability: stressor type. Models achieved good performance across datasets when the type of stressor (e.g., social stress in our case) remains consistent. Factors like stress intensity or brand of the measurement device had minimal impact on cross-dataset performance. Based on our findings, we recommend matching the stressor type when deploying HRV-based stress models in new environments. To the best of our knowledge, this is the first study to systematically investigate factors influencing the cross-dataset applicability of HRV-based stress models.
