Table of Contents
Fetching ...

FLAMe: Federated Learning with Attention Mechanism using Spatio-Temporal Keypoint Transformers for Pedestrian Fall Detection in Smart Cities

Byeonghun Kim, Byeongjoon Noh

TL;DR

Pedestrian fall detection in smart cities faces privacy and bandwidth constraints when using centralized CCTV data. The authors introduce FLAMe, an attention-guided federated learning framework that trains on local pose-keypoint features and transmits only important weights to a central server, reducing communication while preserving privacy. The approach combines a lightweight spatio-temporal keypoint transformer with pose-based preprocessing and an attention-driven FL strategy, validated on the AI-Hub Fall dataset to achieve around 94% accuracy with substantial communication savings and robust performance under non-IID data. The work demonstrates a practical, scalable solution for distributed fall detection in urban environments, enabling safer public spaces with efficient resource use.

Abstract

In smart cities, detecting pedestrian falls is a major challenge to ensure the safety and quality of life of citizens. In this study, we propose a novel fall detection system using FLAMe (Federated Learning with Attention Mechanism), a federated learning (FL) based algorithm. FLAMe trains around important keypoint information and only transmits the trained important weights to the server, reducing communication costs and preserving data privacy. Furthermore, the lightweight keypoint transformer model is integrated into the FL framework to effectively learn spatio-temporal features. We validated the experiment using 22,672 video samples from the "Fall Accident Risk Behavior Video-Sensor Pair data" dataset from AI-Hub. As a result of the experiment, the FLAMe-based system achieved an accuracy of 94.02% with about 190,000 transmission parameters, maintaining performance similar to that of existing centralized learning while maximizing efficiency by reducing communication costs by about 40% compared to the existing FL algorithm, FedAvg. Therefore, the FLAMe algorithm has demonstrated that it provides robust performance in the distributed environment of smart cities and is a practical and effective solution for public safety.

FLAMe: Federated Learning with Attention Mechanism using Spatio-Temporal Keypoint Transformers for Pedestrian Fall Detection in Smart Cities

TL;DR

Pedestrian fall detection in smart cities faces privacy and bandwidth constraints when using centralized CCTV data. The authors introduce FLAMe, an attention-guided federated learning framework that trains on local pose-keypoint features and transmits only important weights to a central server, reducing communication while preserving privacy. The approach combines a lightweight spatio-temporal keypoint transformer with pose-based preprocessing and an attention-driven FL strategy, validated on the AI-Hub Fall dataset to achieve around 94% accuracy with substantial communication savings and robust performance under non-IID data. The work demonstrates a practical, scalable solution for distributed fall detection in urban environments, enabling safer public spaces with efficient resource use.

Abstract

In smart cities, detecting pedestrian falls is a major challenge to ensure the safety and quality of life of citizens. In this study, we propose a novel fall detection system using FLAMe (Federated Learning with Attention Mechanism), a federated learning (FL) based algorithm. FLAMe trains around important keypoint information and only transmits the trained important weights to the server, reducing communication costs and preserving data privacy. Furthermore, the lightweight keypoint transformer model is integrated into the FL framework to effectively learn spatio-temporal features. We validated the experiment using 22,672 video samples from the "Fall Accident Risk Behavior Video-Sensor Pair data" dataset from AI-Hub. As a result of the experiment, the FLAMe-based system achieved an accuracy of 94.02% with about 190,000 transmission parameters, maintaining performance similar to that of existing centralized learning while maximizing efficiency by reducing communication costs by about 40% compared to the existing FL algorithm, FedAvg. Therefore, the FLAMe algorithm has demonstrated that it provides robust performance in the distributed environment of smart cities and is a practical and effective solution for public safety.

Paper Structure

This paper contains 16 sections, 6 equations, 17 figures, 3 tables, 1 algorithm.

Figures (17)

  • Figure 1: Architecture of the spatio-temporal keypoint transformer model
  • Figure 2: Framework of the proposed FLAMe algorithm. At the beginning of each round $t$, client $c$ downloads the global model weights $w_{t-1}^s$ of the previous round from the server. Each client then trains its local model, selects weights $w_t^{c_k}$ ($k=1, 2, \ldots, K$) for important key points, and uploads them to the server. The server aggregates the weights and calculates the new global model weights $w_t^s$.
  • Figure 4: Bar chart example of the distribution of labels for each client
  • Figure 6: Validation accuracy curves of the proposed and baseline models
  • Figure 7: Attention maps of the temporal encoder and spatial encoder as training progresses.
  • ...and 12 more figures