Representation Learning for Wearable-Based Applications in the Case of Missing Data
Janosch Jungo, Yutong Xiang, Shkurta Gashi, Christian Holz
TL;DR
This paper addresses missing data in wearable sensor streams by evaluating masking-based self-supervised learning using transformer architectures for imputation. It contrasts transformer-based reconstruction with traditional imputation methods across 10 signals and assesses effects on downstream HAR and stress detection, revealing robustness for dynamically changing signals and long missing blocks. The work shows that while transformers often outperform baselines, simple methods can suffice for static signals, and it advocates hybrid imputation strategies and SSL-driven pretext tasks to improve real-world wearable analytics. Overall, the findings inform how to design masking-based SSL and imputation pipelines for wearable devices in the presence of substantial missing data, enabling more reliable behavior inference in uncontrolled environments.
Abstract
Wearable devices continuously collect sensor data and use it to infer an individual's behavior, such as sleep, physical activity, and emotions. Despite the significant interest and advancements in this field, modeling multimodal sensor data in real-world environments is still challenging due to low data quality and limited data annotations. In this work, we investigate representation learning for imputing missing wearable data and compare it with state-of-the-art statistical approaches. We investigate the performance of the transformer model on 10 physiological and behavioral signals with different masking ratios. Our results show that transformers outperform baselines for missing data imputation of signals that change more frequently, but not for monotonic signals. We further investigate the impact of imputation strategies and masking rations on downstream classification tasks. Our study provides insights for the design and development of masking-based self-supervised learning tasks and advocates the adoption of hybrid-based imputation strategies to address the challenge of missing data in wearable devices.
