Unsupervised Statistical Feature-Guided Diffusion Model for Sensor-based Human Activity Recognition
Si Zuo, Vitor Fortes Rey, Sungho Suh, Stephan Sigg, Paul Lukowicz
TL;DR
The paper addresses the scarcity of labeled wearable-sensor data for HAR by introducing SF-DM, an unsupervised diffusion model conditioned on simple statistical features to generate diverse synthetic time-series IMU data without labels. It adopts a two-step training pipeline: pretrain SF-DM on unlabeled data and then train a HAR classifier on synthetic data followed by fine-tuning on real data, enabling a class-agnostic data generation process. Empirical results on MM-Fit, PAMAP2, and Opportunity show that SF-DM consistently improves accuracy and Macro F1 scores over conventional oversampling methods and TimeGAN, with substantial gains on several datasets, while reducing the need for per-class generative models. The approach significantly reduces labeling requirements and offers potential for extending to multi-modal data and other time-series domains, enhancing practical HAR deployment.
Abstract
Human activity recognition (HAR) from on-body sensors is a core functionality in many AI applications: from personal health, through sports and wellness to Industry 4.0. A key problem holding up progress in wearable sensor-based HAR, compared to other ML areas, such as computer vision, is the unavailability of diverse and labeled training data. Particularly, while there are innumerable annotated images available in online repositories, freely available sensor data is sparse and mostly unlabeled. We propose an unsupervised statistical feature-guided diffusion model specifically optimized for wearable sensor-based human activity recognition with devices such as inertial measurement unit (IMU) sensors. The method generates synthetic labeled time-series sensor data without relying on annotated training data. Thereby, it addresses the scarcity and annotation difficulties associated with real-world sensor data. By conditioning the diffusion model on statistical information such as mean, standard deviation, Z-score, and skewness, we generate diverse and representative synthetic sensor data. We conducted experiments on public human activity recognition datasets and compared the method to conventional oversampling and state-of-the-art generative adversarial network methods. Experimental results demonstrate that this can improve the performance of human activity recognition and outperform existing techniques.
