Synheart Emotion: Privacy-Preserving On-Device Emotion Recognition from Biosignals
Henok Ademtew, Israel Goytom
TL;DR
This work tackles the privacy and latency challenges of emotion recognition by enabling fully on-device inference from wrist photoplethysmography. Through a comprehensive benchmark across classical ML, DL, and transformer architectures, the authors show that classical ensembles, particularly ExtraTrees and XGBoost, outperform neural networks on small physiological datasets, achieving macro F1 scores up to 0.826 with chest+s wrist sensors and 0.623 with wrist-only data. The study demonstrates a practical deployment path via ONNX optimization, delivering sub-millisecond latency and modest storage footprints (as low as a few MB) on consumer wearables, with energy consumption around 95 μJ per inference. These findings challenge the notion of deep-learning universality in small-sample biomedical tasks and establish a privacy-preserving, real-time reference architecture for wearable affective computing, while acknowledging HRV limitations and the need for multimodal fusion for richer emotion spaces.
Abstract
Human-computer interaction increasingly demands systems that recognize not only explicit user inputs but also implicit emotional states. While substantial progress has been made in affective computing, most emotion recognition systems rely on cloud-based inference, introducing privacy vulnerabilities and latency constraints unsuitable for real-time applications. This work presents a comprehensive evaluation of machine learning architectures for on-device emotion recognition from wrist-based photoplethysmography (PPG), systematically comparing different models spanning classical ensemble methods, deep neural networks, and transformers on the WESAD stress detection dataset. Results demonstrate that classical ensemble methods substantially outperform deep learning on small physiological datasets, with ExtraTrees achieving F1 = 0.826 on combined features and F1 = 0.623 on wrist-only features, compared to transformers achieving only F1 = 0.509-0.577. We deploy the wrist-only ExtraTrees model optimized via ONNX conversion, achieving a 4.08 MB footprint, 0.05 ms inference latency, and 152x speedup over the original implementation. Furthermore, ONNX optimization yields a 30.5% average storage reduction and 40.1x inference speedup, highlighting the feasibility of privacy-preserving on-device emotion recognition for real-world wearables.
