Are Anxiety Detection Models Generalizable? A Cross-Activity and Cross-Population Study Using Wearables

Nilesh Kumar Sahu; Snehil Gupta; Haroon R Lone

Are Anxiety Detection Models Generalizable? A Cross-Activity and Cross-Population Study Using Wearables

Nilesh Kumar Sahu, Snehil Gupta, Haroon R Lone

TL;DR

This study interrogates the generalizability of anxiety-detection models using wearable ECG and EDA signals across three anxiety-provoking activities and across populations. By training and testing across within- and cross-activity scenarios and incorporating two public datasets (WESAD and APD), the authors quantify transfer performance with AUROC values typically in the $0.62$–$0.73$ range and anxious-state recall from $35\%$ to $74\%$. Across-participant transfer proves more challenging than within-participant evaluation, though certain activity pairings and multi-feature configurations yield reasonable generalization, especially when combining ECG-derived HRV features with EDA metrics. The work demonstrates feasibility for just-in-time anxiety detection in diverse contexts while highlighting methodological considerations, such as dataset similarity, class imbalance, and the relative strengths of ML versus deep learning approaches for mobile-health applications. Overall, the findings inform design guidelines for robust, real-time mental-health monitoring and point to avenues for enhancing cross-context transfer through transfer learning and multimodal fusion.

Abstract

Anxiety-provoking activities, such as public speaking, can trigger heightened anxiety responses in individuals with anxiety disorders. Recent research suggests that physiological signals, including electrocardiogram (ECG) and electrodermal activity (EDA), collected via wearable devices, can be used to detect anxiety in such contexts through machine learning models. However, the generalizability of these anxiety prediction models across different activities and diverse populations remains underexplored-an essential step for assessing model bias and fostering user trust in broader applications. To address this gap, we conducted a study with 111 participants who engaged in three anxiety-provoking activities. Utilizing both our collected dataset and two well-known publicly available datasets, we evaluated the generalizability of anxiety detection models within participants (for both same-activity and cross-activity scenarios) and across participants (within-activity and cross-activity). In total, we trained and tested more than 3348 anxiety detection models (using six classifiers, 31 feature sets, and 18 train-test configurations). Our results indicate that three key metrics-AUROC, recall for anxious states, and recall for non-anxious states-were slightly above the baseline score of 0.5. The best AUROC scores ranged from 0.62 to 0.73, with recall for the anxious class spanning 35.19% to 74.3%. Interestingly, model performance (as measured by AUROC) remained relatively stable across different activities and participant groups, though recall for the anxious class did exhibit some variation.

Are Anxiety Detection Models Generalizable? A Cross-Activity and Cross-Population Study Using Wearables

TL;DR

Abstract

Are Anxiety Detection Models Generalizable? A Cross-Activity and Cross-Population Study Using Wearables

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)