Multi-scale Transformer-based Network for Emotion Recognition from Multi Physiological Signals
Tu Vu, Van Thong Huynh, Soo-Hyung Kim
TL;DR
The paper tackles continuous emotion estimation from eight physiological signals using CASE/EPiC data. It introduces a Multi-scale Transformer-based architecture that encodes signals at three temporal scales ($L$, $L/2$, $L/4$) with Gaussian random-feature augmentation and a Transformer encoder, followed by concatenation and fully connected layers to predict Valence and Arousal within $[0.5,9.5]$. On the EPiC CASE dataset, the approach achieves an overall RMSE of $1.45$, with strongest performance in the Across-subject scenario ($V$ and $A$ around $1.345$ and $1.336$, respectively) and comparatively weaker performance for unseen elicitors. The work demonstrates the viability of end-to-end, multi-scale transformer pipelines for multimodal physiological emotion recognition, with potential implications for real-time affective computing in healthcare and human-computer interaction.
Abstract
This paper presents an efficient Multi-scale Transformer-based approach for the task of Emotion recognition from Physiological data, which has gained widespread attention in the research community due to the vast amount of information that can be extracted from these signals using modern sensors and machine learning techniques. Our approach involves applying a Multi-modal technique combined with scaling data to establish the relationship between internal body signals and human emotions. Additionally, we utilize Transformer and Gaussian Transformation techniques to improve signal encoding effectiveness and overall performance. Our model achieves decent results on the CASE dataset of the EPiC competition, with an RMSE score of 1.45.
