Table of Contents
Fetching ...

Learning Unified Representations from Heterogeneous Data for Robust Heart Rate Modeling

Peng Yang, Zhengdong Huang, Zicheng Xie, Wentao Tian, Jingyu Liu, Lunhong Dong

TL;DR

The paper addresses robust heart-rate prediction under real-world data heterogeneity across devices and users. It introduces a unified representation learning framework that combines random feature dropout to handle varying feature sets, a time-aware attention module to capture long-term user context, and a contrastive InfoNCE objective to shape discriminative embeddings, trained with a joint MSE loss. A new ParroTao dataset reflects real-world heterogeneity by preserving device-specific feature sets, and results on FitRec and ParroTao show substantial improvements over strong baselines, along with interpretable, well-clustered user embeddings and a practical route-recommendation application. This work advances robust, personalized heart-rate modeling suitable for deployment in diverse, real-world settings.

Abstract

Heart rate prediction is vital for personalized health monitoring and fitness, while it frequently faces a critical challenge when deploying in real-world: data heterogeneity. We classify it in two key dimensions: source heterogeneity from fragmented device markets with varying feature sets, and user heterogeneity reflecting distinct physiological patterns across individuals and activities. Existing methods either discard device-specific information, or fail to model user-specific differences, limiting their real-world performance. To address this, we propose a framework that learns latent representations agnostic to both heterogeneity, enabling downstream predictors to work consistently under heterogeneous data patterns. Specifically, we introduce a random feature dropout strategy to handle source heterogeneity, making the model robust to various feature sets. To manage user heterogeneity, we employ a time-aware attention module to capture long-term physiological traits and use a contrastive learning objective to build a discriminative representation space. To reflect the heterogeneous nature of real-world data, we created and publicly released a new benchmark dataset, ParroTao. Evaluations on both ParroTao and the public FitRec dataset show that our model significantly outperforms existing baselines by 17% and 15%, respectively. Furthermore, analysis of the learned representations demonstrates their strong discriminative power, and one downstream application task confirm the practical value of our model.

Learning Unified Representations from Heterogeneous Data for Robust Heart Rate Modeling

TL;DR

The paper addresses robust heart-rate prediction under real-world data heterogeneity across devices and users. It introduces a unified representation learning framework that combines random feature dropout to handle varying feature sets, a time-aware attention module to capture long-term user context, and a contrastive InfoNCE objective to shape discriminative embeddings, trained with a joint MSE loss. A new ParroTao dataset reflects real-world heterogeneity by preserving device-specific feature sets, and results on FitRec and ParroTao show substantial improvements over strong baselines, along with interpretable, well-clustered user embeddings and a practical route-recommendation application. This work advances robust, personalized heart-rate modeling suitable for deployment in diverse, real-world settings.

Abstract

Heart rate prediction is vital for personalized health monitoring and fitness, while it frequently faces a critical challenge when deploying in real-world: data heterogeneity. We classify it in two key dimensions: source heterogeneity from fragmented device markets with varying feature sets, and user heterogeneity reflecting distinct physiological patterns across individuals and activities. Existing methods either discard device-specific information, or fail to model user-specific differences, limiting their real-world performance. To address this, we propose a framework that learns latent representations agnostic to both heterogeneity, enabling downstream predictors to work consistently under heterogeneous data patterns. Specifically, we introduce a random feature dropout strategy to handle source heterogeneity, making the model robust to various feature sets. To manage user heterogeneity, we employ a time-aware attention module to capture long-term physiological traits and use a contrastive learning objective to build a discriminative representation space. To reflect the heterogeneous nature of real-world data, we created and publicly released a new benchmark dataset, ParroTao. Evaluations on both ParroTao and the public FitRec dataset show that our model significantly outperforms existing baselines by 17% and 15%, respectively. Furthermore, analysis of the learned representations demonstrates their strong discriminative power, and one downstream application task confirm the practical value of our model.

Paper Structure

This paper contains 30 sections, 9 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Data heterogeneity in wearable data. (a) Three popular wearable devices—Garmin Forerunner 255, Coros Pace 2, and Huawei GT 2—capture different feature sets, indicating source heterogeneity. (b) Users show distinct heart rate distributions under the same activity, highlighting user heterogeneity.
  • Figure 2: Architecture of the proposed framework. Random feature dropout acts on both historical and current inputs to alleviate source heterogeneity. A time-aware attention module compresses the historical record $\mathcal{H}_u$ into a context embedding $\mathbf{u}_u$, which is concatenated with the feature matrix of the current workout plan $\mathbf{X}^{(\mathrm{cur})}$ and fed to a user encoder. Finally, a joint objective combines mean-squared error with an InfoNCE contrastive loss that aligns semantically similar embeddings.
  • Figure 3: t-SNE visualizations of learned feature representations, colored by (a) user identity and (b) sport category. All plots show clear separation between different groups, indicating the effectiveness of the contrastive learning strategy in representation learning.
  • Figure 4: Route recommendation example. (a, b) Topographical profiles and corresponding heart rate responses for two candidate routes, A and B. The close agreement validates the model’s effectiveness in forecasting physiological demands and supporting personalized route selection.
  • Figure 5: Models' Convergence Rate on the FitRec Dataset