Table of Contents
Fetching ...

Dynamic Stress Detection: A Study of Temporal Progression Modelling of Stress in Speech

Vishakha Lall, Yisi Liu

TL;DR

This work reframes speech-based stress detection as a dynamic, temporally evolving task rather than a static labeling problem. It introduces a Stress Progression Labelling Framework that derives fine-grained stress annotations from evolving emotional states and leverages cross-attention sequence models (Unidirectional LSTM and Transformer Encoder) to capture temporal stress progression using multiple feature representations. The approach achieves notable gains on MuSE (+5%) and StressID (+18%) over baselines and generalizes to a real-world EEG-ground-truth dataset, underscoring stress as a dynamic construct with practical implications for real-time monitoring. The findings highlight the value of temporal context and self-supervised features (e.g., HuBERT, Wav2Vec 2.0) in robust stress detection, with dataset-specific tuning of temporal windows suggested for deployment.

Abstract

Detecting psychological stress from speech is critical in high-pressure settings. While prior work has leveraged acoustic features for stress detection, most treat stress as a static label. In this work, we model stress as a temporally evolving phenomenon influenced by historical emotional state. We propose a dynamic labelling strategy that derives fine-grained stress annotations from emotional labels and introduce cross-attention-based sequential models, a Unidirectional LSTM and a Transformer Encoder, to capture temporal stress progression. Our approach achieves notable accuracy gains on MuSE (+5%) and StressID (+18%) over existing baselines, and generalises well to a custom real-world dataset. These results highlight the value of modelling stress as a dynamic construct in speech.

Dynamic Stress Detection: A Study of Temporal Progression Modelling of Stress in Speech

TL;DR

This work reframes speech-based stress detection as a dynamic, temporally evolving task rather than a static labeling problem. It introduces a Stress Progression Labelling Framework that derives fine-grained stress annotations from evolving emotional states and leverages cross-attention sequence models (Unidirectional LSTM and Transformer Encoder) to capture temporal stress progression using multiple feature representations. The approach achieves notable gains on MuSE (+5%) and StressID (+18%) over baselines and generalizes to a real-world EEG-ground-truth dataset, underscoring stress as a dynamic construct with practical implications for real-time monitoring. The findings highlight the value of temporal context and self-supervised features (e.g., HuBERT, Wav2Vec 2.0) in robust stress detection, with dataset-specific tuning of temporal windows suggested for deployment.

Abstract

Detecting psychological stress from speech is critical in high-pressure settings. While prior work has leveraged acoustic features for stress detection, most treat stress as a static label. In this work, we model stress as a temporally evolving phenomenon influenced by historical emotional state. We propose a dynamic labelling strategy that derives fine-grained stress annotations from emotional labels and introduce cross-attention-based sequential models, a Unidirectional LSTM and a Transformer Encoder, to capture temporal stress progression. Our approach achieves notable accuracy gains on MuSE (+5%) and StressID (+18%) over existing baselines, and generalises well to a custom real-world dataset. These results highlight the value of modelling stress as a dynamic construct in speech.

Paper Structure

This paper contains 14 sections, 4 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Temporal segmentation of long audio sequences
  • Figure 2: Model during training and inference