WiFi based Human Fall and Activity Recognition using Transformer based Encoder Decoder and Graph Neural Networks
Younggeol Cho, Elisa Motta, Olivia Nocentini, Marta Lagomarsino, Andrea Merello, Marco Crepaldi, Arash Ajoudani
TL;DR
This work tackles privacy concerns in fall detection by leveraging WiFi CSI to estimate human skeletons and perform action recognition without cameras. It introduces TED-Net, a Transformer-augmented encoder-decoder that derives 17 2D keypoints from CSI across three antennas, and a Directed Graph Neural Network (DGNN) that classifies actions using CSI-derived skeletons with frame-level granularity. Across two datasets—MM-Fi and a custom fall-focused collection—the authors demonstrate that TED-Net outperforms existing CSI-based pose estimators and that DGNN achieves near-RGB performance for fall detection, validating the privacy-preserving viability of WiFi-based sensing. The results highlight the practical potential for home-based, vision-free monitoring of elderly individuals, while acknowledging limitations in spatial resolution and environmental sensitivity that invite future work on 3D pose and multi-person scenarios.
Abstract
Human pose estimation and action recognition have received attention due to their critical roles in healthcare monitoring, rehabilitation, and assistive technologies. In this study, we proposed a novel architecture named Transformer based Encoder Decoder Network (TED Net) designed for estimating human skeleton poses from WiFi Channel State Information (CSI). TED Net integrates convolutional encoders with transformer based attention mechanisms to capture spatiotemporal features from CSI signals. The estimated skeleton poses were used as input to a customized Directed Graph Neural Network (DGNN) for action recognition. We validated our model on two datasets: a publicly available multi modal dataset for assessing general pose estimation, and a newly collected dataset focused on fall related scenarios involving 20 participants. Experimental results demonstrated that TED Net outperformed existing approaches in pose estimation, and that the DGNN achieves reliable action classification using CSI based skeletons, with performance comparable to RGB based systems. Notably, TED Net maintains robust performance across both fall and non fall cases. These findings highlight the potential of CSI driven human skeleton estimation for effective action recognition, particularly in home environments such as elderly fall detection. In such settings, WiFi signals are often readily available, offering a privacy preserving alternative to vision based methods, which may raise concerns about continuous camera monitoring.
