A Masked Semi-Supervised Learning Approach for Otago Micro Labels Recognition
Meng Shang, Lenore Dedeyne, Jolan Dupont, Laura Vercauteren, Nadjia Amini, Laurence Lapauw, Evelien Gielen, Sabine Verschueren, Carolina Varon, Walter De Raedt, Bart Vanrumste
TL;DR
This study tackles the challenge of recognizing micro-activities within the Otago Exercise Program (OEP) using a single waist-worn IMU, addressing limited labeled data with a semi-supervised masked Transformer that jointly learns classification and signal reconstruction. The architecture shares Transformer weights between a supervised OS-TCN classifier and an unsupervised reconstruction block, trained with the loss $L = \eta L_{CE} + L_{MSE}$ and a high mask ratio to improve contextual feature learning. Results show that the masked Transformer generally improves performance over baselines, with $F1$-scores exceeding the clinically relevant threshold of $0.8$ in lab and home settings, and enabling automatic extraction of repetition counts and chair-rising velocity as clinically meaningful outcomes. The work demonstrates the viability of micro-activity recognition in daily life for older adults, offering a path toward continuous monitoring of exercise adherence and intensity, and suggests future work including additional sensors and broader clinical validation.
Abstract
The Otago Exercise Program (OEP) serves as a vital rehabilitation initiative for older adults, aiming to enhance their strength and balance, and consequently prevent falls. While Human Activity Recognition (HAR) systems have been widely employed in recognizing the activities of individuals, existing systems focus on the duration of macro activities (i.e. a sequence of repetitions of the same exercise), neglecting the ability to discern micro activities (i.e. the individual repetitions of the exercises), in the case of OEP. This study presents a novel semi-supervised machine learning approach aimed at bridging this gap in recognizing the micro activities of OEP. To manage the limited dataset size, our model utilizes a Transformer encoder for feature extraction, subsequently classified by a Temporal Convolutional Network (TCN). Simultaneously, the Transformer encoder is employed for masked unsupervised learning to reconstruct input signals. Results indicate that the masked unsupervised learning task enhances the performance of the supervised learning (classification task), as evidenced by f1-scores surpassing the clinically applicable threshold of 0.8. From the micro activities, two clinically relevant outcomes emerge: counting the number of repetitions of each exercise and calculating the velocity during chair rising. These outcomes enable the automatic monitoring of exercise intensity and difficulty in the daily lives of older adults.
