A Lightweight 3D-CNN for Event-Based Human Action Recognition with Privacy-Preserving Potential
Mehdi Sefidgar Dilmaghani, Francis Fowley, Peter Corcoran
TL;DR
This work tackles privacy concerns in human action recognition by leveraging event-based vision and a compact 3D-CNN designed for edge deployment. The authors convert neuromorphic event streams into low-resolution grayscale frames and train a five-block 3D-CNN with focal loss to handle class imbalance, achieving an accuracy of $94.17\%$ and an F1-score of $0.9415$ on a balanced, cross-dataset 6-class HAR task. The model outperforms standard 3D-CNN baselines (C3D, ResNet3D, MC3_18) while maintaining substantially lower weight and inference costs, with a total training time around $322$ minutes. Practical impact includes privacy-preserving, real-time HAR suitable for healthcare, surveillance, and smart environments, with future work exploring end-to-end event processing and spiking-neural-network approaches for further efficiency gains.
Abstract
This paper presents a lightweight three-dimensional convolutional neural network (3DCNN) for human activity recognition (HAR) using event-based vision data. Privacy preservation is a key challenge in human monitoring systems, as conventional frame-based cameras capture identifiable personal information. In contrast, event cameras record only changes in pixel intensity, providing an inherently privacy-preserving sensing modality. The proposed network effectively models both spatial and temporal dynamics while maintaining a compact design suitable for edge deployment. To address class imbalance and enhance generalization, focal loss with class reweighting and targeted data augmentation strategies are employed. The model is trained and evaluated on a composite dataset derived from the Toyota Smart Home and ETRI datasets. Experimental results demonstrate an F1-score of 0.9415 and an overall accuracy of 94.17%, outperforming benchmark 3D-CNN architectures such as C3D, ResNet3D, and MC3_18 by up to 3%. These results highlight the potential of event-based deep learning for developing accurate, efficient, and privacy-aware human action recognition systems suitable for real-world edge applications.
