Table of Contents
Fetching ...

Modeling Human Skeleton Joint Dynamics for Fall Detection

Sania Zahan, Ghulam Mubashar Hassan, Ajmal Mian

TL;DR

This work addresses privacy-friendly fall detection by leveraging skeleton joints as a graph representation to capture rich spatio-temporal dynamics. It introduces a lightweight STCN framework that integrates input embedding of joint positions and velocities, a learnable-adjacency skeleton graph, and multi-path spatial-temporal convolutions (SGCN, TGCN, STCN). Across large-scale datasets (NTU 60/120 and UWA 3D), the method achieves state-of-the-art accuracy with far fewer parameters and faster inference than prior approaches, demonstrating strong generalization in cross-subject and cross-view settings. The resulting approach is well-suited for privacy-preserving, real-time monitoring on embedded platforms, providing robust fall detection without exposing raw appearance data.

Abstract

The increasing pace of population aging calls for better care and support systems. Falling is a frequent and critical problem for elderly people causing serious long-term health issues. Fall detection from video streams is not an attractive option for real-life applications due to privacy issues. Existing methods try to resolve this issue by using very low-resolution cameras or video encryption. However, privacy cannot be ensured completely with such approaches. Key points on the body, such as skeleton joints, can convey significant information about motion dynamics and successive posture changes which are crucial for fall detection. Skeleton joints have been explored for feature extraction but with image recognition models that ignore joint dependency across frames which is important for the classification of actions. Moreover, existing models are over-parameterized or evaluated on small datasets with very few activity classes. We propose an efficient graph convolution network model that exploits spatio-temporal joint dependencies and dynamics of human skeleton joints for accurate fall detection. Our method leverages dynamic representation with robust concurrent spatio-temporal characteristics of skeleton joints. We performed extensive experiments on three large-scale datasets. With a significantly smaller model size than most existing methods, our proposed method achieves state-of-the-art results on the large scale NTU datasets.

Modeling Human Skeleton Joint Dynamics for Fall Detection

TL;DR

This work addresses privacy-friendly fall detection by leveraging skeleton joints as a graph representation to capture rich spatio-temporal dynamics. It introduces a lightweight STCN framework that integrates input embedding of joint positions and velocities, a learnable-adjacency skeleton graph, and multi-path spatial-temporal convolutions (SGCN, TGCN, STCN). Across large-scale datasets (NTU 60/120 and UWA 3D), the method achieves state-of-the-art accuracy with far fewer parameters and faster inference than prior approaches, demonstrating strong generalization in cross-subject and cross-view settings. The resulting approach is well-suited for privacy-preserving, real-time monitoring on embedded platforms, providing robust fall detection without exposing raw appearance data.

Abstract

The increasing pace of population aging calls for better care and support systems. Falling is a frequent and critical problem for elderly people causing serious long-term health issues. Fall detection from video streams is not an attractive option for real-life applications due to privacy issues. Existing methods try to resolve this issue by using very low-resolution cameras or video encryption. However, privacy cannot be ensured completely with such approaches. Key points on the body, such as skeleton joints, can convey significant information about motion dynamics and successive posture changes which are crucial for fall detection. Skeleton joints have been explored for feature extraction but with image recognition models that ignore joint dependency across frames which is important for the classification of actions. Moreover, existing models are over-parameterized or evaluated on small datasets with very few activity classes. We propose an efficient graph convolution network model that exploits spatio-temporal joint dependencies and dynamics of human skeleton joints for accurate fall detection. Our method leverages dynamic representation with robust concurrent spatio-temporal characteristics of skeleton joints. We performed extensive experiments on three large-scale datasets. With a significantly smaller model size than most existing methods, our proposed method achieves state-of-the-art results on the large scale NTU datasets.

Paper Structure

This paper contains 16 sections, 4 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Falls injury death cases, by age group and sex, 2017–18 from AIHW National Hospital Morbidity Database AIHW_FALL.
  • Figure 2: Fall related injury in hospitalisation cases, by type of injury and sex, 2017–18 from AIHW National Hospital Morbidity Database AIHW_FALL.
  • Figure 3: Percentage of different sensor types used in fall detection systems from 2014 to 2018. “Kinect & accelerometer” represents systems using both sensors Tao_2018.
  • Figure 4: Architecture overview: Input embedding has a two-layer 1D-CNN block that normalizes the frames and creates an enhanced projection, then both the embedded joint and velocity streams are concatenated. Each basic block deploys a multi-pathway of spatial-temporal and Conv2D modules to capture spatio-temporal dependencies.
  • Figure 5: Illustration of joint adjacency with 3 hop distance. (a): skeleton representation of neighbors (blue) of spine joint (red). (b):$N\times N$ matrix representation for all 25 joints.
  • ...and 1 more figures