Representation Learning for Wearable-Based Applications in the Case of Missing Data

Janosch Jungo; Yutong Xiang; Shkurta Gashi; Christian Holz

Representation Learning for Wearable-Based Applications in the Case of Missing Data

Janosch Jungo, Yutong Xiang, Shkurta Gashi, Christian Holz

TL;DR

This paper addresses missing data in wearable sensor streams by evaluating masking-based self-supervised learning using transformer architectures for imputation. It contrasts transformer-based reconstruction with traditional imputation methods across 10 signals and assesses effects on downstream HAR and stress detection, revealing robustness for dynamically changing signals and long missing blocks. The work shows that while transformers often outperform baselines, simple methods can suffice for static signals, and it advocates hybrid imputation strategies and SSL-driven pretext tasks to improve real-world wearable analytics. Overall, the findings inform how to design masking-based SSL and imputation pipelines for wearable devices in the presence of substantial missing data, enabling more reliable behavior inference in uncontrolled environments.

Abstract

Wearable devices continuously collect sensor data and use it to infer an individual's behavior, such as sleep, physical activity, and emotions. Despite the significant interest and advancements in this field, modeling multimodal sensor data in real-world environments is still challenging due to low data quality and limited data annotations. In this work, we investigate representation learning for imputing missing wearable data and compare it with state-of-the-art statistical approaches. We investigate the performance of the transformer model on 10 physiological and behavioral signals with different masking ratios. Our results show that transformers outperform baselines for missing data imputation of signals that change more frequently, but not for monotonic signals. We further investigate the impact of imputation strategies and masking rations on downstream classification tasks. Our study provides insights for the design and development of masking-based self-supervised learning tasks and advocates the adoption of hybrid-based imputation strategies to address the challenge of missing data in wearable devices.

Representation Learning for Wearable-Based Applications in the Case of Missing Data

TL;DR

Abstract

Paper Structure (12 sections, 3 equations, 7 figures, 2 tables)

This paper contains 12 sections, 3 equations, 7 figures, 2 tables.

Introduction
Approach
Results
Conclusion
Appendices.
Datasets
Missing Data Problem
Approach
Missing Data Imputation Model.
Downstream Classification Model.
Results
Acknowledgments

Figures (7)

Figure 1: Overview of our data analysis pipeline.
Figure 2: Performance of imputation strategies in downstream tasks: activity recognition and stress detection.
Figure 3: Performance in downstream tasks using the whole data set (missing data rate 0) and eliminating data (e.g., 0.1).
Figure 4: Overview of the available data per physiological variable. The x-axis shows the amount of data in percentage (%) and the y-axis shows each physiological parameter.
Figure 5: Overview of the missing data patterns for an exemplary set of days. The x-axis shows time in minutes and the y-axis shows the 10 physiological parameters considered in this work.
...and 2 more figures

Representation Learning for Wearable-Based Applications in the Case of Missing Data

TL;DR

Abstract

Representation Learning for Wearable-Based Applications in the Case of Missing Data

Authors

TL;DR

Abstract

Table of Contents

Figures (7)