Fusing Biomechanical and Spatio-Temporal Features for Fall Prediction: Characterizing and Mitigating the Simulation-to-Reality Gap

Md Fokhrul Islam; Sajeda Al-Hammouri; Christopher J. Arellano; Kavan Hazeli; Heman Shakeri

Fusing Biomechanical and Spatio-Temporal Features for Fall Prediction: Characterizing and Mitigating the Simulation-to-Reality Gap

Md Fokhrul Islam, Sajeda Al-Hammouri, Christopher J. Arellano, Kavan Hazeli, Heman Shakeri

TL;DR

This work tackles imminent fall prediction from vision data, addressing the scarcity of real fall data and a persistent simulation–reality gap. It introduces BioST-GCN, a dual-stream architecture that fuses pose-based spatio-temporal features (via an ST-GCN with Body Attention) with engineered biomechanical features processed by a BiLSTM, connected through a cross-attention AttFusion module. BioST-GCN achieves superior intra-subject performance (F1 ~89.1%, AUPRC ~91.1%) and demonstrates clear improvements over vanilla ST-GCN, while revealing a substantial zero-shot generalization drop (~35.9% F1) when transferring to unseen subjects; few-shot personalization shows rapid performance gains, underscoring the need for model personalization and richer, bias-aware data. The study highlights the critical simulation–reality gap for fall prediction in elderly populations and calls for privacy-preserving data pipelines and domain adaptation strategies to translate these advances into clinically reliable tools.

Abstract

Falls are a leading cause of injury and loss of independence among older adults. Vision-based fall prediction systems offer a non-invasive solution to anticipate falls seconds before impact, but their development is hindered by the scarcity of available fall data. Contributing to these efforts, this study proposes the Biomechanical Spatio-Temporal Graph Convolutional Network (BioST-GCN), a dual-stream model that combines both pose and biomechanical information using a cross-attention fusion mechanism. Our model outperforms the vanilla ST-GCN baseline by 5.32% and 2.91% F1-score on the simulated MCF-UA stunt-actor and MUVIM datasets, respectively. The spatio-temporal attention mechanisms in the ST-GCN stream also provide interpretability by identifying critical joints and temporal phases. However, a critical simulation-reality gap persists. While our model achieves an 89.0% F1-score with full supervision on simulated data, zero-shot generalization to unseen subjects drops to 35.9%. This performance decline is likely due to biases in simulated data, such as 'intent-to-fall' cues. For older adults, particularly those with diabetes or frailty, this gap is exacerbated by their unique kinematic profiles. To address this, we propose personalization strategies and advocate for privacy-preserving data pipelines to enable real-world validation. Our findings underscore the urgent need to bridge the gap between simulated and real-world data to develop effective fall prediction systems for vulnerable elderly populations.

Fusing Biomechanical and Spatio-Temporal Features for Fall Prediction: Characterizing and Mitigating the Simulation-to-Reality Gap

TL;DR

Abstract

Fusing Biomechanical and Spatio-Temporal Features for Fall Prediction: Characterizing and Mitigating the Simulation-to-Reality Gap

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)