A Real-Time Human Action Recognition Model for Assisted Living
Yixuan Wang, Paul Stynes, Pramod Pathak, Cristina Muntean
TL;DR
The paper addresses real-time safety monitoring in assisted living by evaluating four state-of-the-art HAR models with transfer learning on the NTU RGB+D 60 RGB dataset to predict Falls, Staggering, Chest Pain, and Normal activities. TimeSformer (divided) emerges as the strongest performer in macro metrics and inference throughput, enabling effective real-time alerts when dangerous events are detected. A live video prediction and alert system is proposed to process streaming video, trigger notifications, and support monitoring across multiple residents, with deployment considerations and cost analyses. While promising, the study notes limitations including the dataset’s lack of assisted-living specificity and high compute needs, suggesting future work on skeleton-based HAR and real-world validation to enhance robustness and applicability.
Abstract
Ensuring the safety and well-being of elderly and vulnerable populations in assisted living environments is a critical concern. Computer vision presents an innovative and powerful approach to predicting health risks through video monitoring, employing human action recognition (HAR) technology. However, real-time prediction of human actions with high performance and efficiency is a challenge. This research proposes a real-time human action recognition model that combines a deep learning model and a live video prediction and alert system, in order to predict falls, staggering and chest pain for residents in assisted living. Six thousand RGB video samples from the NTU RGB+D 60 dataset were selected to create a dataset with four classes: Falling, Staggering, Chest Pain, and Normal, with the Normal class comprising 40 daily activities. Transfer learning technique was applied to train four state-of-the-art HAR models on a GPU server, namely, UniFormerV2, TimeSformer, I3D, and SlowFast. Results of the four models are presented in this paper based on class-wise and macro performance metrics, inference efficiency, model complexity and computational costs. TimeSformer is proposed for developing the real-time human action recognition model, leveraging its leading macro F1 score (95.33%), recall (95.49%), and precision (95.19%) along with significantly higher inference throughput compared to the others. This research provides insights to enhance safety and health of the elderly and people with chronic illnesses in assisted living environments, fostering sustainable care, smarter communities and industry innovation.
