ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition
Nastaran Mansourian, Arash Mohammadi, M. Omair Ahmad, M. N. S. Swamy
TL;DR
ECG-EmotionNet addresses driver emotion recognition under dynamic driving by adapting a pre-trained ECG foundation model through a nested Mixture of Experts that fuses embeddings from all transformer layers while freezing the backbone. This yields richer global and local feature representations using single-channel ECG with significantly fewer trainable parameters. On the manD 1.0 benchmark, it achieves an average accuracy of 82.45% and an F1 score of 77.11% across five emotions, outperforming static-environment baselines with a fraction of the training cost. The approach offers practical benefits for ADAS and HAT in autonomous driving, with robustness to noise and efficient computation, and points to future multimodal integration and real-time deployment.
Abstract
Driver emotion recognition plays a crucial role in driver monitoring systems, enhancing human-autonomy interactions and the trustworthiness of Autonomous Driving (AD). Various physiological and behavioural modalities have been explored for this purpose, with Electrocardiogram (ECG) emerging as a standout choice for real-time emotion monitoring, particularly in dynamic and unpredictable driving conditions. Existing methods, however, often rely on multi-channel ECG signals recorded under static conditions, limiting their applicability in real-world dynamic driving scenarios. To address this limitation, the paper introduces ECG-EmotionNet, a novel architecture designed specifically for emotion recognition in dynamic driving environments. ECG-EmotionNet is constructed by adapting a recently introduced ECG Foundation Model (FM) and uniquely employs single-channel ECG signals, ensuring both robust generalizability and computational efficiency. Unlike conventional adaptation methods such as full fine-tuning, linear probing, or low-rank adaptation, we propose an intuitively pleasing alternative, referred to as the nested Mixture of Experts (MoE) adaptation. More precisely, each transformer layer of the underlying FM is treated as a separate expert, with embeddings extracted from these experts fused using trainable weights within a gating mechanism. This approach enhances the representation of both global and local ECG features, leading to a 6% improvement in accuracy and a 7% increase in the F1 score, all while maintaining computational efficiency. The effectiveness of the proposed ECG-EmotionNet architecture is evaluated using a recently introduced and challenging driver emotion monitoring dataset.
