Table of Contents
Fetching ...

Synheart Emotion: Privacy-Preserving On-Device Emotion Recognition from Biosignals

Henok Ademtew, Israel Goytom

TL;DR

This work tackles the privacy and latency challenges of emotion recognition by enabling fully on-device inference from wrist photoplethysmography. Through a comprehensive benchmark across classical ML, DL, and transformer architectures, the authors show that classical ensembles, particularly ExtraTrees and XGBoost, outperform neural networks on small physiological datasets, achieving macro F1 scores up to 0.826 with chest+s wrist sensors and 0.623 with wrist-only data. The study demonstrates a practical deployment path via ONNX optimization, delivering sub-millisecond latency and modest storage footprints (as low as a few MB) on consumer wearables, with energy consumption around 95 μJ per inference. These findings challenge the notion of deep-learning universality in small-sample biomedical tasks and establish a privacy-preserving, real-time reference architecture for wearable affective computing, while acknowledging HRV limitations and the need for multimodal fusion for richer emotion spaces.

Abstract

Human-computer interaction increasingly demands systems that recognize not only explicit user inputs but also implicit emotional states. While substantial progress has been made in affective computing, most emotion recognition systems rely on cloud-based inference, introducing privacy vulnerabilities and latency constraints unsuitable for real-time applications. This work presents a comprehensive evaluation of machine learning architectures for on-device emotion recognition from wrist-based photoplethysmography (PPG), systematically comparing different models spanning classical ensemble methods, deep neural networks, and transformers on the WESAD stress detection dataset. Results demonstrate that classical ensemble methods substantially outperform deep learning on small physiological datasets, with ExtraTrees achieving F1 = 0.826 on combined features and F1 = 0.623 on wrist-only features, compared to transformers achieving only F1 = 0.509-0.577. We deploy the wrist-only ExtraTrees model optimized via ONNX conversion, achieving a 4.08 MB footprint, 0.05 ms inference latency, and 152x speedup over the original implementation. Furthermore, ONNX optimization yields a 30.5% average storage reduction and 40.1x inference speedup, highlighting the feasibility of privacy-preserving on-device emotion recognition for real-world wearables.

Synheart Emotion: Privacy-Preserving On-Device Emotion Recognition from Biosignals

TL;DR

This work tackles the privacy and latency challenges of emotion recognition by enabling fully on-device inference from wrist photoplethysmography. Through a comprehensive benchmark across classical ML, DL, and transformer architectures, the authors show that classical ensembles, particularly ExtraTrees and XGBoost, outperform neural networks on small physiological datasets, achieving macro F1 scores up to 0.826 with chest+s wrist sensors and 0.623 with wrist-only data. The study demonstrates a practical deployment path via ONNX optimization, delivering sub-millisecond latency and modest storage footprints (as low as a few MB) on consumer wearables, with energy consumption around 95 μJ per inference. These findings challenge the notion of deep-learning universality in small-sample biomedical tasks and establish a privacy-preserving, real-time reference architecture for wearable affective computing, while acknowledging HRV limitations and the need for multimodal fusion for richer emotion spaces.

Abstract

Human-computer interaction increasingly demands systems that recognize not only explicit user inputs but also implicit emotional states. While substantial progress has been made in affective computing, most emotion recognition systems rely on cloud-based inference, introducing privacy vulnerabilities and latency constraints unsuitable for real-time applications. This work presents a comprehensive evaluation of machine learning architectures for on-device emotion recognition from wrist-based photoplethysmography (PPG), systematically comparing different models spanning classical ensemble methods, deep neural networks, and transformers on the WESAD stress detection dataset. Results demonstrate that classical ensemble methods substantially outperform deep learning on small physiological datasets, with ExtraTrees achieving F1 = 0.826 on combined features and F1 = 0.623 on wrist-only features, compared to transformers achieving only F1 = 0.509-0.577. We deploy the wrist-only ExtraTrees model optimized via ONNX conversion, achieving a 4.08 MB footprint, 0.05 ms inference latency, and 152x speedup over the original implementation. Furthermore, ONNX optimization yields a 30.5% average storage reduction and 40.1x inference speedup, highlighting the feasibility of privacy-preserving on-device emotion recognition for real-world wearables.

Paper Structure

This paper contains 37 sections, 3 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Synheart Emotion system architecture. Left: Smartwatch with PPG sensor captures blood volume pulse at 64 Hz. Center: On-device processing pipeline showing four stages: (1) Signal preprocessing (bandpass filter 0.5-8 Hz), (2) Peak detection (adaptive thresholding to extract RR intervals), (3) Feature extraction (compute SDNN, RMSSD, pNN50, Mean RR, Mean HR), (4) Classification (ExtraTrees ONNX model 2.41 MB, 0.04ms inference). Right: Output shows emotion label (Baseline, Stress, or Amusement) with confidence score. Dashed boundary indicates all computation occurs on-device with no cloud transmission. All models converted to ONNX for cross-platform deployment.
  • Figure 2: Distribution of HRV features across emotional states in WESAD dataset (488 samples, 12 subjects). Five box plots show median, quartiles, and outliers for Baseline (green), Stress (red), and Amusement (blue). Contrary to typical expectations, SDNN and RMSSD exhibit higher values during Stress (medians: 301.6 ms, 397.0 ms) compared to Baseline (210.5 ms, 287.8 ms), reflecting high inter-subject variability and motion artifacts in wrist PPG. pNN50 shows significant separation ($p<0.001$) with Stress highest (84.8%). Mean_RR and HR_mean show minimal class differences ($p>0.05$), confirming limited discriminative value. Statistical significance via Kruskal--Wallis: *** $p<0.001$ for SDNN, RMSSD, pNN50; ns (not significant) for Mean_RR, HR_mean.
  • Figure 3: Multi-panel visualization of the physiological signal processing pipeline. Top: Raw wrist PPG waveform (64 Hz) with detected peaks marked by red dots. Second: Extracted inter-beat intervals (IBI) showing time between consecutive heartbeats. Lower panels: Five computed HRV features (SDNN, RMSSD, pNN50, Mean_RR, Mean_HR) from 120-second windows across three emotional states. The Stress condition shows reduced SDNN and RMSSD compared to Baseline, consistent with sympathetic nervous system activation and parasympathetic withdrawal. All features computed using artifact-rejected IBIs ($<300$ ms or $>2000$ ms excluded).
  • Figure 4: Model efficiency frontier: F1 score versus model size for the WRIST_ALL scenario. Left panel shows original formats (PKL/PTH), right shows ONNX-optimized models. The green dashed line indicates the Pareto frontier (Linear SVM $\rightarrow$ Simple MLP $\rightarrow$ XGBoost). The shaded region marks the desirable deployment zone ($\mathrm{F1}>0.65$, $<500$ KB). XGBoost achieves the best F1 score (0.685) under 500 KB. ExtraTrees was deployed despite its larger size (4.18 MB, ONNX) for robustness and a 152$\times$ speedup. Classical ML (blue circles) outperforms neural networks (red squares) on this small dataset (488 samples).
  • Figure 5: Heatmap confusion matrices showing classification performance. Panel (a): XGBoost using all wrist-based HRV features. Diagonal shows correct predictions: Baseline 95%, Stress 67%, Amusement 69%. Main confusion occurs between Stress and Baseline (28% misclassification) and between Stress and Amusement (6%). Panel (b): Comparative matrices for Classical ML, Neural Network, and Transformer architectures, showing classical ML achieves highest diagonal values.
  • ...and 1 more figures