CMD-HAR: Cross-Modal Disentanglement for Wearable Human Activity Recognition
Ying Yu, Siyao Li, Yixuan Jiang, Hang Xiao, Jingxi Long, Haotian Tang, Hanyu Liu, Chao Li
TL;DR
CMD-HAR tackles multimodal sensor-based HAR by combining cross-modal disentanglement with spatiotemporal attention and dynamic gradient balancing, underpinned by an embedded deployment workflow. The method integrates Channel Expansion, spatiotemporal disentanglement, AMGB, and a wearable deployment pipeline, achieving robust recognition across complex activities and on resource-constrained devices. Empirical results across six public HAR datasets show improved accuracy and especially higher G-mean on challenging tasks, along with feasible on-device latency and memory profiles. This work advances practical cross-modal HAR by enabling accurate, efficient recognition in real-world wearables and deployments.
Abstract
Human Activity Recognition (HAR) is a fundamental technology for numerous human - centered intelligent applications. Although deep learning methods have been utilized to accelerate feature extraction, issues such as multimodal data mixing, activity heterogeneity, and complex model deployment remain largely unresolved. The aim of this paper is to address issues such as multimodal data mixing, activity heterogeneity, and complex model deployment in sensor-based human activity recognition. We propose a spatiotemporal attention modal decomposition alignment fusion strategy to tackle the problem of the mixed distribution of sensor data. Key discriminative features of activities are captured through cross-modal spatio-temporal disentangled representation, and gradient modulation is combined to alleviate data heterogeneity. In addition, a wearable deployment simulation system is constructed. We conducted experiments on a large number of public datasets, demonstrating the effectiveness of the model.
