Table of Contents
Fetching ...

Cross-Modality Investigation on WESAD Stress Classification

Eric Oliver, Sagnik Dakshit

TL;DR

This study develops and evaluates transformer-based classifiers for three-class stress detection on the WESAD dataset using six physiological modalities, demonstrating near-perfect intra-modality performance and robust cross-modality transfer for ECG, EDA, RESP, TEMP, and EMG. It introduces a patch-based 1D-transformer architecture with positional encoding and self-attention, trained on raw signals with minimal preprocessing. Through embedding-space visualizations (UMAP) and quantitative variance analysis, the work explains why certain modalities generalize across sensors while accelerometer data underperform due to higher variance. The findings establish state-of-the-art accuracy on WESAD for multiclass stress detection and offer actionable insights for cross-modal deployment of wearable-based stress monitoring systems.

Abstract

Deep learning's growing prevalence has driven its widespread use in healthcare, where AI and sensor advancements enhance diagnosis, treatment, and monitoring. In mobile health, AI-powered tools enable early diagnosis and continuous monitoring of conditions like stress. Wearable technologies and multimodal physiological data have made stress detection increasingly viable, but model efficacy depends on data quality, quantity, and modality. This study develops transformer models for stress detection using the WESAD dataset, training on electrocardiograms (ECG), electrodermal activity (EDA), electromyography (EMG), respiration rate (RESP), temperature (TEMP), and 3-axis accelerometer (ACC) signals. The results demonstrate the effectiveness of single-modality transformers in analyzing physiological signals, achieving state-of-the-art performance with accuracy, precision and recall values in the range of $99.73\%$ to $99.95\%$ for stress detection. Furthermore, this study explores cross-modal performance and also explains the same using 2D visualization of the learned embedding space and quantitative analysis based on data variance. Despite the large body of work on stress detection and monitoring, the robustness and generalization of these models across different modalities has not been explored. This research represents one of the initial efforts to interpret embedding spaces for stress detection, providing valuable information on cross-modal performance.

Cross-Modality Investigation on WESAD Stress Classification

TL;DR

This study develops and evaluates transformer-based classifiers for three-class stress detection on the WESAD dataset using six physiological modalities, demonstrating near-perfect intra-modality performance and robust cross-modality transfer for ECG, EDA, RESP, TEMP, and EMG. It introduces a patch-based 1D-transformer architecture with positional encoding and self-attention, trained on raw signals with minimal preprocessing. Through embedding-space visualizations (UMAP) and quantitative variance analysis, the work explains why certain modalities generalize across sensors while accelerometer data underperform due to higher variance. The findings establish state-of-the-art accuracy on WESAD for multiclass stress detection and offer actionable insights for cross-modal deployment of wearable-based stress monitoring systems.

Abstract

Deep learning's growing prevalence has driven its widespread use in healthcare, where AI and sensor advancements enhance diagnosis, treatment, and monitoring. In mobile health, AI-powered tools enable early diagnosis and continuous monitoring of conditions like stress. Wearable technologies and multimodal physiological data have made stress detection increasingly viable, but model efficacy depends on data quality, quantity, and modality. This study develops transformer models for stress detection using the WESAD dataset, training on electrocardiograms (ECG), electrodermal activity (EDA), electromyography (EMG), respiration rate (RESP), temperature (TEMP), and 3-axis accelerometer (ACC) signals. The results demonstrate the effectiveness of single-modality transformers in analyzing physiological signals, achieving state-of-the-art performance with accuracy, precision and recall values in the range of to for stress detection. Furthermore, this study explores cross-modal performance and also explains the same using 2D visualization of the learned embedding space and quantitative analysis based on data variance. Despite the large body of work on stress detection and monitoring, the robustness and generalization of these models across different modalities has not been explored. This research represents one of the initial efforts to interpret embedding spaces for stress detection, providing valuable information on cross-modal performance.

Paper Structure

This paper contains 20 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Our Proposed Transformer Architecture
  • Figure 2: UMAP Visualization of embedding space for modalities. Only Channel 1 is presented for ACC as this has the worst performance over other two channels.
  • Figure 3: Comparison of reported accuracy and F1 scores of existing research on WESAD.
  • Figure 4: Comparison of accuracy, precision, and recall of existing research on WESAD.