Table of Contents
Fetching ...

Are Anxiety Detection Models Generalizable? A Cross-Activity and Cross-Population Study Using Wearables

Nilesh Kumar Sahu, Snehil Gupta, Haroon R Lone

TL;DR

This study interrogates the generalizability of anxiety-detection models using wearable ECG and EDA signals across three anxiety-provoking activities and across populations. By training and testing across within- and cross-activity scenarios and incorporating two public datasets (WESAD and APD), the authors quantify transfer performance with AUROC values typically in the $0.62$–$0.73$ range and anxious-state recall from $35\%$ to $74\%$. Across-participant transfer proves more challenging than within-participant evaluation, though certain activity pairings and multi-feature configurations yield reasonable generalization, especially when combining ECG-derived HRV features with EDA metrics. The work demonstrates feasibility for just-in-time anxiety detection in diverse contexts while highlighting methodological considerations, such as dataset similarity, class imbalance, and the relative strengths of ML versus deep learning approaches for mobile-health applications. Overall, the findings inform design guidelines for robust, real-time mental-health monitoring and point to avenues for enhancing cross-context transfer through transfer learning and multimodal fusion.

Abstract

Anxiety-provoking activities, such as public speaking, can trigger heightened anxiety responses in individuals with anxiety disorders. Recent research suggests that physiological signals, including electrocardiogram (ECG) and electrodermal activity (EDA), collected via wearable devices, can be used to detect anxiety in such contexts through machine learning models. However, the generalizability of these anxiety prediction models across different activities and diverse populations remains underexplored-an essential step for assessing model bias and fostering user trust in broader applications. To address this gap, we conducted a study with 111 participants who engaged in three anxiety-provoking activities. Utilizing both our collected dataset and two well-known publicly available datasets, we evaluated the generalizability of anxiety detection models within participants (for both same-activity and cross-activity scenarios) and across participants (within-activity and cross-activity). In total, we trained and tested more than 3348 anxiety detection models (using six classifiers, 31 feature sets, and 18 train-test configurations). Our results indicate that three key metrics-AUROC, recall for anxious states, and recall for non-anxious states-were slightly above the baseline score of 0.5. The best AUROC scores ranged from 0.62 to 0.73, with recall for the anxious class spanning 35.19% to 74.3%. Interestingly, model performance (as measured by AUROC) remained relatively stable across different activities and participant groups, though recall for the anxious class did exhibit some variation.

Are Anxiety Detection Models Generalizable? A Cross-Activity and Cross-Population Study Using Wearables

TL;DR

This study interrogates the generalizability of anxiety-detection models using wearable ECG and EDA signals across three anxiety-provoking activities and across populations. By training and testing across within- and cross-activity scenarios and incorporating two public datasets (WESAD and APD), the authors quantify transfer performance with AUROC values typically in the range and anxious-state recall from to . Across-participant transfer proves more challenging than within-participant evaluation, though certain activity pairings and multi-feature configurations yield reasonable generalization, especially when combining ECG-derived HRV features with EDA metrics. The work demonstrates feasibility for just-in-time anxiety detection in diverse contexts while highlighting methodological considerations, such as dataset similarity, class imbalance, and the relative strengths of ML versus deep learning approaches for mobile-health applications. Overall, the findings inform design guidelines for robust, real-time mental-health monitoring and point to avenues for enhancing cross-context transfer through transfer learning and multimodal fusion.

Abstract

Anxiety-provoking activities, such as public speaking, can trigger heightened anxiety responses in individuals with anxiety disorders. Recent research suggests that physiological signals, including electrocardiogram (ECG) and electrodermal activity (EDA), collected via wearable devices, can be used to detect anxiety in such contexts through machine learning models. However, the generalizability of these anxiety prediction models across different activities and diverse populations remains underexplored-an essential step for assessing model bias and fostering user trust in broader applications. To address this gap, we conducted a study with 111 participants who engaged in three anxiety-provoking activities. Utilizing both our collected dataset and two well-known publicly available datasets, we evaluated the generalizability of anxiety detection models within participants (for both same-activity and cross-activity scenarios) and across participants (within-activity and cross-activity). In total, we trained and tested more than 3348 anxiety detection models (using six classifiers, 31 feature sets, and 18 train-test configurations). Our results indicate that three key metrics-AUROC, recall for anxious states, and recall for non-anxious states-were slightly above the baseline score of 0.5. The best AUROC scores ranged from 0.62 to 0.73, with recall for the anxious class spanning 35.19% to 74.3%. Interestingly, model performance (as measured by AUROC) remained relatively stable across different activities and participant groups, though recall for the anxious class did exhibit some variation.

Paper Structure

This paper contains 42 sections, 1 equation, 9 figures, 19 tables.

Figures (9)

  • Figure 1: Overview of our contribution: We conducted a study comprising 40 sessions, where each session involved three participants performing three anxiety-inducing activities: a speech activity (A1, shown in green), a group discussion (A2, shown in blue), and an interview (A3, shown in black). Using the ECG and EDA data collected from participants during these sessions, we investigated our research questions on generalizability. Specifically, we examined generalizability within participants & within activity, as well as within participants & cross activities. To further explore cross-participant & cross-activity generalizability and cross-participant & within activity generalizability, we used two publicly available datasets: WESAD and Anxiety Phase Detection. Best viewed in color.
  • Figure 2: Position of ECG and EDA electrodes of shimmer sensors on participants' chest and fingers, respectively.
  • Figure 3: Activity sequence followed in each study session. PAQ pointing to shaded blocks (grey color) denote the instances at which participants filled the post activity questionnaire (PAQ). Shaded blocks (blue color) , represent the rest period before starting the next activity.
  • Figure 4: Mean PAQ scores of anxious and non-anxious participants for five different questions (i.e., Q1, Q2, Q3, Q4, and Q5) during (a) Activity 1 (Speech), (b) Activity 2 (Group discussion), and (c) Activity 3 (Interview). (d) shows the mean of all 5 questions for different activity 1, 2, and 3, separately.
  • Figure 5: t-SNE plots, i.e., low-dimensional representation of high-dimensional features sets (a) F1, (b) F2, (c) F3, (d) F4, and (e) F5 collected during Activity 1 (blue colour), Activity 2 (Orange color), and Activity 3 (green color). Best viewed in colour.
  • ...and 4 more figures