Table of Contents
Fetching ...

Wearable data from subjects playing Super Mario, sitting university exams, or performing physical exercise help detect acute mood episodes via self-supervised learning

Filippo Corponi, Bryan M. Li, Gerard Anmella, Clàudia Valenzuela-Pascual, Ariadna Mas, Isabella Pacchiarotti, Marc Valentí, Iria Grande, Antonio Benabarre, Marina Garriga, Eduard Vieta, Allan H Young, Stephen M. Lawrie, Heather C. Whalley, Diego Hidalgo-Mazzei, Antonio Vergari

TL;DR

This work addresses detecting acute mood episodes in mood disorders using wearable data while mitigating the annotated-data bottleneck through self-supervised learning (SSL). It builds E4SelfLearning from 11 open-access Empatica E4 datasets (161 subjects) and introduces an E4mer Transformer for the target task of distinguishing acute mood episodes from euthymia, using SSL pre-training with unlabelled data followed by supervised fine-tuning. The study shows that masked-prediction SSL achieves higher segment- and subject-level accuracy (ACC_segment ≈ 0.812 and ACC_subject ≈ 0.906) than fully supervised E4mer and XGBoost baselines, and that SSL gains scale with unlabelled data availability and depend on the chosen pretext task. By analyzing learned embeddings and providing open access to the pre-processing pipeline and data, the work demonstrates SSL as a viable path to reduce annotation demands and advance clinical deployment of personal-sensing tools for mood disorders.

Abstract

Personal sensing, leveraging data passively and near-continuously collected with wearables from patients in their ecological environment, is a promising paradigm to monitor mood disorders (MDs), a major determinant of worldwide disease burden. However, collecting and annotating wearable data is very resource-intensive. Studies of this kind can thus typically afford to recruit only a couple dozens of patients. This constitutes one of the major obstacles to applying modern supervised machine learning techniques to MDs detection. In this paper, we overcome this data bottleneck and advance the detection of MDs acute episode vs stable state from wearables data on the back of recent advances in self-supervised learning (SSL). This leverages unlabelled data to learn representations during pre-training, subsequently exploited for a supervised task. First, we collected open-access datasets recording with an Empatica E4 spanning different, unrelated to MD monitoring, personal sensing tasks -- from emotion recognition in Super Mario players to stress detection in undergraduates -- and devised a pre-processing pipeline performing on-/off-body detection, sleep-wake detection, segmentation, and (optionally) feature extraction. With 161 E4-recorded subjects, we introduce E4SelfLearning, the largest to date open access collection, and its pre-processing pipeline. Second, we show that SSL confidently outperforms fully-supervised pipelines using either our novel E4-tailored Transformer architecture (E4mer) or classical baseline XGBoost: 81.23% against 75.35% (E4mer) and 72.02% (XGBoost) correctly classified recording segments from 64 (half acute, half stable) patients. Lastly, we illustrate that SSL performance is strongly associated with the specific surrogate task employed for pre-training as well as with unlabelled data availability.

Wearable data from subjects playing Super Mario, sitting university exams, or performing physical exercise help detect acute mood episodes via self-supervised learning

TL;DR

This work addresses detecting acute mood episodes in mood disorders using wearable data while mitigating the annotated-data bottleneck through self-supervised learning (SSL). It builds E4SelfLearning from 11 open-access Empatica E4 datasets (161 subjects) and introduces an E4mer Transformer for the target task of distinguishing acute mood episodes from euthymia, using SSL pre-training with unlabelled data followed by supervised fine-tuning. The study shows that masked-prediction SSL achieves higher segment- and subject-level accuracy (ACC_segment ≈ 0.812 and ACC_subject ≈ 0.906) than fully supervised E4mer and XGBoost baselines, and that SSL gains scale with unlabelled data availability and depend on the chosen pretext task. By analyzing learned embeddings and providing open access to the pre-processing pipeline and data, the work demonstrates SSL as a viable path to reduce annotation demands and advance clinical deployment of personal-sensing tools for mood disorders.

Abstract

Personal sensing, leveraging data passively and near-continuously collected with wearables from patients in their ecological environment, is a promising paradigm to monitor mood disorders (MDs), a major determinant of worldwide disease burden. However, collecting and annotating wearable data is very resource-intensive. Studies of this kind can thus typically afford to recruit only a couple dozens of patients. This constitutes one of the major obstacles to applying modern supervised machine learning techniques to MDs detection. In this paper, we overcome this data bottleneck and advance the detection of MDs acute episode vs stable state from wearables data on the back of recent advances in self-supervised learning (SSL). This leverages unlabelled data to learn representations during pre-training, subsequently exploited for a supervised task. First, we collected open-access datasets recording with an Empatica E4 spanning different, unrelated to MD monitoring, personal sensing tasks -- from emotion recognition in Super Mario players to stress detection in undergraduates -- and devised a pre-processing pipeline performing on-/off-body detection, sleep-wake detection, segmentation, and (optionally) feature extraction. With 161 E4-recorded subjects, we introduce E4SelfLearning, the largest to date open access collection, and its pre-processing pipeline. Second, we show that SSL confidently outperforms fully-supervised pipelines using either our novel E4-tailored Transformer architecture (E4mer) or classical baseline XGBoost: 81.23% against 75.35% (E4mer) and 72.02% (XGBoost) correctly classified recording segments from 64 (half acute, half stable) patients. Lastly, we illustrate that SSL performance is strongly associated with the specific surrogate task employed for pre-training as well as with unlabelled data availability.
Paper Structure (3 sections, 2 equations, 5 figures, 10 tables)

This paper contains 3 sections, 2 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: A total of 6267 hours ($\sim$261 days) of unlabelled recordings from 252 subjects while awake were used for self-supervised pre-training. Unlabelled data comprised a collection of eleven open-access datasets, whose pre-processed version we make publicly available (E4SelfLearning), along with part of the INTREPIBD study that was not relevant for the target task under investigation, i.e. acute episode vs euthymia classification. Unlabelled data was passed through a model consisting of an encoder and a transform head for self-supervised pre-training; the pre-trained encoder block was then retained for the target task while the transform head was replaced with a new, randomly initialized classification head. N: subjects #; H$_{\text{w}}$: waking hours #; $^*$ figures herewith reported for the INTREPIBD study do not include the target task (labelled) training set which was also used during self-supervised pre-training.
  • Figure 2: E4mer is a Transformer model tailored to the Empatica E4 input data. The E4mer is constituted of three sequential modules: 1) Channel embeddings set in parallel, one for each Empatica E4 raw input channel (i.e. ACL$_{\mathsf{x}}$, ACL$_{\mathsf{y}}$, ACL$_{\mathsf{z}}$, BVP, EDA, TEMP), extracting features and mapping channels to tensors of dimensionality ($B$=batch size, $N$= time steps, $F$= filters #) so that they can be conveniently concatenated along the dimension $F$; 2) Representation Module learning contextual representations of the input time steps within the input segment thanks to the multi-head self-attention mechanism; 3) Classification Head outputting probabilities for the two target classes, i.e. acute episode and euthymia. Self-supervised learning models employed in our experiments feature the same E4mer architecture described above, where the Classification Head however is replaced with a Transform Head projecting onto a label space compatible with the pretext task at hand.
  • Figure 3: Surrogate tasks used for self-supervised pre-training. (\ref{['fig:denoising']}) Masked prediction: grey-shaded areas correspond to zeroed-out time-series portions; the model is tasked with minimizing the distance between the original time-series and the one imputed at the masked areas. (\ref{['fig:transformations']}) Transformation prediction: the figure shows the type of transformations applied to input time-series; given transformed channels, the model was trained to learn which transformation each channel underwent.
  • Figure 4: Self-supervised learning beats supervised-learning by six more subjects correctly classified. Segment Accuracy ($\text{ACC}_{\text{segment}}$) under self-supervised learning and supervised learning (E4mer) within each subject's test segments. Subjects in euthymia are represented as triangles while subjects on an acute episode are shown as circles with the left (right) half coloured in blue (red) with a gradient proportional to total sum on the Hamilton Depression Rating Scale-17 (Young Mania Rating Scale), a doctor-administered questionnaire gauging depression (mania) severity. Subjects' position on the x (y) axis corresponds to their proportion of recording segments correctly classified by supervised (self-supervised) learning. Note that a subject's majority vote over their segments is in agreement with the subject's true mood state when the proportion of correctly classified segments from that subject is greater than 0.5. HDRS (YMRS) range showed on the colorbar refers to values scored in the INTREPIBD sample, while the total score in general can range between [0-52] ([0-60]).
  • Figure 5: Reassuringly, the learned embeddings seem to have captured meaningful semantics about the underlying context. Top left: embeddings from the encoder pre-trained on mask prediction map sleep and wake segments to different parts of the latent space. Top right: embeddings from the encoder fine-tuned on the target task show that segments from the unlabelled open-access datasets, which presumably do not contain subjects on an acute mood episode, tend to cluster with part of the segments from patients in euthymia. Bottom left (right): embeddings from the fine-tuned encoder show a gradient in symptoms' severity across target task segments, as revealed by Hamilton Depression Rating Scale-17 (Young Mania Rating Scale) total score. Note that unlabelled segments are not showed in the bottom left (right) plot and that the HDRS (YMRS) range showed on the colorbar refers to values scored in the INTREPIBD sample, while the total score in general can range between [0-52] ([0-60])