Table of Contents
Fetching ...

A Lightweight 3D-CNN for Event-Based Human Action Recognition with Privacy-Preserving Potential

Mehdi Sefidgar Dilmaghani, Francis Fowley, Peter Corcoran

TL;DR

This work tackles privacy concerns in human action recognition by leveraging event-based vision and a compact 3D-CNN designed for edge deployment. The authors convert neuromorphic event streams into low-resolution grayscale frames and train a five-block 3D-CNN with focal loss to handle class imbalance, achieving an accuracy of $94.17\%$ and an F1-score of $0.9415$ on a balanced, cross-dataset 6-class HAR task. The model outperforms standard 3D-CNN baselines (C3D, ResNet3D, MC3_18) while maintaining substantially lower weight and inference costs, with a total training time around $322$ minutes. Practical impact includes privacy-preserving, real-time HAR suitable for healthcare, surveillance, and smart environments, with future work exploring end-to-end event processing and spiking-neural-network approaches for further efficiency gains.

Abstract

This paper presents a lightweight three-dimensional convolutional neural network (3DCNN) for human activity recognition (HAR) using event-based vision data. Privacy preservation is a key challenge in human monitoring systems, as conventional frame-based cameras capture identifiable personal information. In contrast, event cameras record only changes in pixel intensity, providing an inherently privacy-preserving sensing modality. The proposed network effectively models both spatial and temporal dynamics while maintaining a compact design suitable for edge deployment. To address class imbalance and enhance generalization, focal loss with class reweighting and targeted data augmentation strategies are employed. The model is trained and evaluated on a composite dataset derived from the Toyota Smart Home and ETRI datasets. Experimental results demonstrate an F1-score of 0.9415 and an overall accuracy of 94.17%, outperforming benchmark 3D-CNN architectures such as C3D, ResNet3D, and MC3_18 by up to 3%. These results highlight the potential of event-based deep learning for developing accurate, efficient, and privacy-aware human action recognition systems suitable for real-world edge applications.

A Lightweight 3D-CNN for Event-Based Human Action Recognition with Privacy-Preserving Potential

TL;DR

This work tackles privacy concerns in human action recognition by leveraging event-based vision and a compact 3D-CNN designed for edge deployment. The authors convert neuromorphic event streams into low-resolution grayscale frames and train a five-block 3D-CNN with focal loss to handle class imbalance, achieving an accuracy of and an F1-score of on a balanced, cross-dataset 6-class HAR task. The model outperforms standard 3D-CNN baselines (C3D, ResNet3D, MC3_18) while maintaining substantially lower weight and inference costs, with a total training time around minutes. Practical impact includes privacy-preserving, real-time HAR suitable for healthcare, surveillance, and smart environments, with future work exploring end-to-end event processing and spiking-neural-network approaches for further efficiency gains.

Abstract

This paper presents a lightweight three-dimensional convolutional neural network (3DCNN) for human activity recognition (HAR) using event-based vision data. Privacy preservation is a key challenge in human monitoring systems, as conventional frame-based cameras capture identifiable personal information. In contrast, event cameras record only changes in pixel intensity, providing an inherently privacy-preserving sensing modality. The proposed network effectively models both spatial and temporal dynamics while maintaining a compact design suitable for edge deployment. To address class imbalance and enhance generalization, focal loss with class reweighting and targeted data augmentation strategies are employed. The model is trained and evaluated on a composite dataset derived from the Toyota Smart Home and ETRI datasets. Experimental results demonstrate an F1-score of 0.9415 and an overall accuracy of 94.17%, outperforming benchmark 3D-CNN architectures such as C3D, ResNet3D, and MC3_18 by up to 3%. These results highlight the potential of event-based deep learning for developing accurate, efficient, and privacy-aware human action recognition systems suitable for real-world edge applications.

Paper Structure

This paper contains 37 sections, 1 equation, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Training and validation loss graphs the proposed method.
  • Figure 2: Training and validation loss graphs the proposed method.
  • Figure 3: Confusion matrices of networks: (a) C3D, (b) ResNet3D, (c) MC3_18, (d) Proposed method.
  • Figure 4: Example frames from misclassified video samples classified by the proposed network.