Saga: Capturing Multi-granularity Semantics from Massive Unlabelled IMU Data for User Perception
Yunzhe Li, Facheng Hu, Hongzi Zhu, Shifan Zhang, Liang Zhang, Shan Chang, Minyi Guo
TL;DR
Saga tackles IMU-based user perception with scarce labeled data by leveraging multi-level masked pre-training on large unlabeled IMU data. It introduces four masking levels to capture sensor, point, sub-period, and period semantics and uses Bayesian Optimization with Gaussian Processes to adapt weights for downstream tasks. Extensive experiments across three datasets and five devices demonstrate Saga achieving near-supervised accuracy with around 100 labels per class and low on-device overhead, enabling practical deployment. This work advances understanding of IMU data semantics and provides a scalable, low-label solution for mobile sensing tasks.
Abstract
Inertial measurement units (IMUs), have been prevalently used in a wide range of mobile perception applications such as activity recognition and user authentication, where a large amount of labelled data are normally required to train a satisfactory model. However, it is difficult to label micro-activities in massive IMU data due to the hardness of understanding raw IMU data and the lack of ground truth. In this paper, we propose a novel fine-grained user perception approach, called Saga, which only needs a small amount of labelled IMU data to achieve stunning user perception accuracy. The core idea of Saga is to first pre-train a backbone feature extraction model, utilizing the rich semantic information of different levels embedded in the massive unlabelled IMU data. Meanwhile, for a specific downstream user perception application, Bayesian Optimization is employed to determine the optimal weights for pre-training tasks involving different semantic levels. We implement Saga on five typical mobile phones and evaluate Saga on three typical tasks on three IMU datasets. Results show that when only using about 100 training samples per class, Saga can achieve over 90% accuracy of the full-fledged model trained on over ten thousands training samples with no additional system overhead.
