Table of Contents
Fetching ...

CMD-HAR: Cross-Modal Disentanglement for Wearable Human Activity Recognition

Ying Yu, Siyao Li, Yixuan Jiang, Hang Xiao, Jingxi Long, Haotian Tang, Hanyu Liu, Chao Li

TL;DR

CMD-HAR tackles multimodal sensor-based HAR by combining cross-modal disentanglement with spatiotemporal attention and dynamic gradient balancing, underpinned by an embedded deployment workflow. The method integrates Channel Expansion, spatiotemporal disentanglement, AMGB, and a wearable deployment pipeline, achieving robust recognition across complex activities and on resource-constrained devices. Empirical results across six public HAR datasets show improved accuracy and especially higher G-mean on challenging tasks, along with feasible on-device latency and memory profiles. This work advances practical cross-modal HAR by enabling accurate, efficient recognition in real-world wearables and deployments.

Abstract

Human Activity Recognition (HAR) is a fundamental technology for numerous human - centered intelligent applications. Although deep learning methods have been utilized to accelerate feature extraction, issues such as multimodal data mixing, activity heterogeneity, and complex model deployment remain largely unresolved. The aim of this paper is to address issues such as multimodal data mixing, activity heterogeneity, and complex model deployment in sensor-based human activity recognition. We propose a spatiotemporal attention modal decomposition alignment fusion strategy to tackle the problem of the mixed distribution of sensor data. Key discriminative features of activities are captured through cross-modal spatio-temporal disentangled representation, and gradient modulation is combined to alleviate data heterogeneity. In addition, a wearable deployment simulation system is constructed. We conducted experiments on a large number of public datasets, demonstrating the effectiveness of the model.

CMD-HAR: Cross-Modal Disentanglement for Wearable Human Activity Recognition

TL;DR

CMD-HAR tackles multimodal sensor-based HAR by combining cross-modal disentanglement with spatiotemporal attention and dynamic gradient balancing, underpinned by an embedded deployment workflow. The method integrates Channel Expansion, spatiotemporal disentanglement, AMGB, and a wearable deployment pipeline, achieving robust recognition across complex activities and on resource-constrained devices. Empirical results across six public HAR datasets show improved accuracy and especially higher G-mean on challenging tasks, along with feasible on-device latency and memory profiles. This work advances practical cross-modal HAR by enabling accurate, efficient recognition in real-world wearables and deployments.

Abstract

Human Activity Recognition (HAR) is a fundamental technology for numerous human - centered intelligent applications. Although deep learning methods have been utilized to accelerate feature extraction, issues such as multimodal data mixing, activity heterogeneity, and complex model deployment remain largely unresolved. The aim of this paper is to address issues such as multimodal data mixing, activity heterogeneity, and complex model deployment in sensor-based human activity recognition. We propose a spatiotemporal attention modal decomposition alignment fusion strategy to tackle the problem of the mixed distribution of sensor data. Key discriminative features of activities are captured through cross-modal spatio-temporal disentangled representation, and gradient modulation is combined to alleviate data heterogeneity. In addition, a wearable deployment simulation system is constructed. We conducted experiments on a large number of public datasets, demonstrating the effectiveness of the model.

Paper Structure

This paper contains 19 sections, 11 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Framework of our CMD-HAR method, including (A) Channel expansion, (B) Spatiotemporal de entanglement module, (C) Gradient modulation, (D) Embedded deployment
  • Figure 2: Detailed structure of the proposed CMD-HAR model
  • Figure 3: Performance Comparison on OPPO. and WISDM.
  • Figure 4: Inference delay of each model. (a) WISDM, (b) PAMAP2