Table of Contents
Fetching ...

MTS-LOF: Medical Time-Series Representation Learning via Occlusion-Invariant Features

Huayu Li, Ana S. Carreon-Rascon, Xiwen Chen, Geng Yuan, Ao Li

TL;DR

The paper tackles the labeling bottleneck in medical time-series analysis by introducing MTS-LOF, a self-supervised framework that fuses joint-embedding SSL with Masked Auto Encoding and a multi-masking occlusion strategy to learn occlusion-invariant representations from unlabeled data. By coupling a CNN1D-augmented patching backbone with a transformer encoder and a MAE-inspired objective alongside a joint-embedding objective and covariance regularization, MTS-LOF captures both temporal and structural dependencies in medical signals. Empirical results across HAR, Sleep-EDF, Epilepsy, SHHS, and FD demonstrate superior performance over baselines, strong transferability across domains, and notable gains in semi-supervised settings, underscoring the approach’s practicality for label-efficient healthcare analytics. The work suggests significant impact for clinical decision support and real-time monitoring on wearables, enabling robust representations without extensive annotations.

Abstract

Medical time series data are indispensable in healthcare, providing critical insights for disease diagnosis, treatment planning, and patient management. The exponential growth in data complexity, driven by advanced sensor technologies, has presented challenges related to data labeling. Self-supervised learning (SSL) has emerged as a transformative approach to address these challenges, eliminating the need for extensive human annotation. In this study, we introduce a novel framework for Medical Time Series Representation Learning, known as MTS-LOF. MTS-LOF leverages the strengths of contrastive learning and Masked Autoencoder (MAE) methods, offering a unique approach to representation learning for medical time series data. By combining these techniques, MTS-LOF enhances the potential of healthcare applications by providing more sophisticated, context-rich representations. Additionally, MTS-LOF employs a multi-masking strategy to facilitate occlusion-invariant feature learning. This approach allows the model to create multiple views of the data by masking portions of it. By minimizing the discrepancy between the representations of these masked patches and the fully visible patches, MTS-LOF learns to capture rich contextual information within medical time series datasets. The results of experiments conducted on diverse medical time series datasets demonstrate the superiority of MTS-LOF over other methods. These findings hold promise for significantly enhancing healthcare applications by improving representation learning. Furthermore, our work delves into the integration of joint-embedding SSL and MAE techniques, shedding light on the intricate interplay between temporal and structural dependencies in healthcare data. This understanding is crucial, as it allows us to grasp the complexities of healthcare data analysis.

MTS-LOF: Medical Time-Series Representation Learning via Occlusion-Invariant Features

TL;DR

The paper tackles the labeling bottleneck in medical time-series analysis by introducing MTS-LOF, a self-supervised framework that fuses joint-embedding SSL with Masked Auto Encoding and a multi-masking occlusion strategy to learn occlusion-invariant representations from unlabeled data. By coupling a CNN1D-augmented patching backbone with a transformer encoder and a MAE-inspired objective alongside a joint-embedding objective and covariance regularization, MTS-LOF captures both temporal and structural dependencies in medical signals. Empirical results across HAR, Sleep-EDF, Epilepsy, SHHS, and FD demonstrate superior performance over baselines, strong transferability across domains, and notable gains in semi-supervised settings, underscoring the approach’s practicality for label-efficient healthcare analytics. The work suggests significant impact for clinical decision support and real-time monitoring on wearables, enabling robust representations without extensive annotations.

Abstract

Medical time series data are indispensable in healthcare, providing critical insights for disease diagnosis, treatment planning, and patient management. The exponential growth in data complexity, driven by advanced sensor technologies, has presented challenges related to data labeling. Self-supervised learning (SSL) has emerged as a transformative approach to address these challenges, eliminating the need for extensive human annotation. In this study, we introduce a novel framework for Medical Time Series Representation Learning, known as MTS-LOF. MTS-LOF leverages the strengths of contrastive learning and Masked Autoencoder (MAE) methods, offering a unique approach to representation learning for medical time series data. By combining these techniques, MTS-LOF enhances the potential of healthcare applications by providing more sophisticated, context-rich representations. Additionally, MTS-LOF employs a multi-masking strategy to facilitate occlusion-invariant feature learning. This approach allows the model to create multiple views of the data by masking portions of it. By minimizing the discrepancy between the representations of these masked patches and the fully visible patches, MTS-LOF learns to capture rich contextual information within medical time series datasets. The results of experiments conducted on diverse medical time series datasets demonstrate the superiority of MTS-LOF over other methods. These findings hold promise for significantly enhancing healthcare applications by improving representation learning. Furthermore, our work delves into the integration of joint-embedding SSL and MAE techniques, shedding light on the intricate interplay between temporal and structural dependencies in healthcare data. This understanding is crucial, as it allows us to grasp the complexities of healthcare data analysis.
Paper Structure (18 sections, 10 equations, 5 figures, 6 tables)

This paper contains 18 sections, 10 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration of the backbone network architecture. This figure provides an overview of the backbone network employed in our study, designed to effectively process multidimensional multivariate time series samples. The input time series undergoes a patching process using a CNN1D, followed by transformation through the transformer encoder to generate meaningful representations. The final representations are obtained from the outputs of the transformer encoder post a global average pooling layer. These representations are then input into a linear classifier to make the final predictions.
  • Figure 2: Illustration of the MTS-LOF framework workflow. The framework leverages Occlusion-Invariant Feature Learning (MAE) and Joint-Embedding SSL principles to enhance representation robustness. It employs multiple mask operations to enhance consistency in occlusion-invariant features, ensuring the model's effectiveness in the presence of occluded data. The similarity objective ($\mathcal{L}_{sim}$) measures the agreement between masked and unmasked representations, while covariance regularization ($\mathcal{L}_{TCR}$) is employed to mitigate representation collapse. A transformer decoder and positional embeddings contribute to comprehensive feature extraction. The hyperparameter $\lambda$ balances $\mathcal{L}_{sim}$ and $\mathcal{L}_{TCR}$.
  • Figure 3: Fine-tuning the pretrained backbone with different fractions of labeled data from the Sleep-EDF dataset. The plot illustrates the performance of the MTS-LOF framework under semi-supervised learning conditions, comparing F1 scores obtained with 1%, 5%, 10%, 50%, and 100% of randomly selected subsets of labeled data to the fully supervised learning result with 100% labeled data. These results highlight the framework's adaptability and ability to leverage minimal labeled data effectively.
  • Figure 4: Comparison of F1 scores under different combinations of hyperparameters in the ablation study using the HAR dataset. (a) Illustrates the relationship between F1 score and the number of masks while keeping the mask ratio constant at 0.8. (b) Shows the impact of varying mask ratios on the F1 score while maintaining a constant number of 20 masks. These findings provide insights into the sensitivity of the F1 score to these critical hyperparameters.
  • Figure 5: t-SNE visualizations of learned representations for Epilepsy, HAR, and Sleep-EDF datasets. The subfigures display the effects of different training methodologies: SSL, supervised training, and 5% fine-tuning on each dataset.