Dynamic Stress Detection: A Study of Temporal Progression Modelling of Stress in Speech
Vishakha Lall, Yisi Liu
TL;DR
This work reframes speech-based stress detection as a dynamic, temporally evolving task rather than a static labeling problem. It introduces a Stress Progression Labelling Framework that derives fine-grained stress annotations from evolving emotional states and leverages cross-attention sequence models (Unidirectional LSTM and Transformer Encoder) to capture temporal stress progression using multiple feature representations. The approach achieves notable gains on MuSE (+5%) and StressID (+18%) over baselines and generalizes to a real-world EEG-ground-truth dataset, underscoring stress as a dynamic construct with practical implications for real-time monitoring. The findings highlight the value of temporal context and self-supervised features (e.g., HuBERT, Wav2Vec 2.0) in robust stress detection, with dataset-specific tuning of temporal windows suggested for deployment.
Abstract
Detecting psychological stress from speech is critical in high-pressure settings. While prior work has leveraged acoustic features for stress detection, most treat stress as a static label. In this work, we model stress as a temporally evolving phenomenon influenced by historical emotional state. We propose a dynamic labelling strategy that derives fine-grained stress annotations from emotional labels and introduce cross-attention-based sequential models, a Unidirectional LSTM and a Transformer Encoder, to capture temporal stress progression. Our approach achieves notable accuracy gains on MuSE (+5%) and StressID (+18%) over existing baselines, and generalises well to a custom real-world dataset. These results highlight the value of modelling stress as a dynamic construct in speech.
