IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare

Subrata Kumer Paul; Abu Saleh Musa Miah; Rakhi Rani Paul; Md. Ekramul Hamid; Jungpil Shin; Md Abdur Rahim

IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare

Subrata Kumer Paul, Abu Saleh Musa Miah, Rakhi Rani Paul, Md. Ekramul Hamid, Jungpil Shin, Md Abdur Rahim

TL;DR

Problem: Real-time, privacy-preserving recognition of medical-related human activities (MRHA) in healthcare settings. Approach: a hybrid ENConvLSTM pipeline that uses EfficientNet-based spatial feature extraction from OpenPose skeleton frames, followed by ConvLSTM for spatio-temporal modeling, with an IoT deployment for real-time alerts. Contributions: (i) ENConvLSTM architecture achieving state-of-the-art accuracy on NTU RGB+D 120 (CS 94.85%, CV 96.45%) and HMDB51 (89.22%); (ii) a scalable IoT-enabled real-time system using Raspberry Pi, GSM, and Twilio SMS; (iii) comprehensive ablation and comparison against existing methods. Impact: enables proactive patient monitoring, improves safety, privacy, and potential healthcare cost reductions in home and facility settings.

Abstract

The Internet of Things (IoT) and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients. Recognizing medical-related human activities (MRHA) is pivotal for healthcare systems, particularly for identifying actions that are critical to patient well-being. However, challenges such as high computational demands, low accuracy, and limited adaptability persist in Human Motion Recognition (HMR). While some studies have integrated HMR with IoT for real-time healthcare applications, limited research has focused on recognizing MRHA as essential for effective patient monitoring. This study proposes a novel HMR method for MRHA detection, leveraging multi-stage deep learning techniques integrated with IoT. The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions (MBConv) blocks, followed by ConvLSTM to capture spatio-temporal patterns. A classification module with global average pooling, a fully connected layer, and a dropout layer generates the final predictions. The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets, focusing on MRHA, such as sneezing, falling, walking, sitting, etc. It achieves 94.85% accuracy for cross-subject evaluations and 96.45% for cross-view evaluations on NTU RGB+D 120, along with 89.00% accuracy on HMDB51. Additionally, the system integrates IoT capabilities using a Raspberry Pi and GSM module, delivering real-time alerts via Twilios SMS service to caregivers and patients. This scalable and efficient solution bridges the gap between HMR and IoT, advancing patient monitoring, improving healthcare outcomes, and reducing costs.

IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare

TL;DR

Abstract

Paper Structure (28 sections, 7 equations, 13 figures, 12 tables)

This paper contains 28 sections, 7 equations, 13 figures, 12 tables.

Introduction
Current Fall Detection Systems and Their Challenges
Emerging Datasets and Research Gaps
Motivation
The Goal and Scope of the Study
Related Work
Dataset
NTU RGB+D 120 dataset
HMDB51 Dataset
Propsoed Method
Data Preprocessing
Spatial Temporal Feature Extraction
EfficientNet
ConvLSTM
Classification Module
...and 13 more sections

Figures (13)

Figure 1: Class overlapping among experimenting datasets.
Figure 2: Sample Example of "NTU RGB+D 120" Dataset.
Figure 3: The workflow of the proposed methodology this figure represents an end-to-end system for human motion detection using a deep learning model, followed by real-time monitoring and notification.
Figure 4: (a) Proposed multi-stage deep learning model constructed with (b) EfficientNet and (C) ConvLSTM beside the classification module (d) MB Convolution.
Figure 5: Illustration of the configuration of 25 body joints in our dataset (a). The labels of these joints are (b): (1) base of spine, (2) middle of the spine, (3) neck, (4) head, (5) left shoulder, (6) left elbow, (7) left wrist, (8) left hand, (9) right shoulder, (10) right elbow, (11) right wrist, (12) right hand, (13) left hip, (14) left knee, (15) left ankle, (16) left foot, (17) right hip, (18) right knee, (19) right ankle, (20) right foot, (21) spine, (22) tip of the left hand, (23) left thumb, (24) tip of right hand, (25) right thumb zhang2023multi_skeleton25_points.
...and 8 more figures

IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare

TL;DR

Abstract

IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare

Authors

TL;DR

Abstract

Table of Contents

Figures (13)