Action Recognition based Industrial Safety Violation Detection
Surya N Reddy, Vaibhav Kurrey, Mayank Nagar, Gagan Raj Gupta
TL;DR
This work tackles PPE violation detection in industrial environments by conditioning PPE checks on the worker's action, thereby reducing false alarms. It introduces a novel large-scale industrial action dataset with dense spatio-temporal annotations collected from surveillance footage, and an integrated framework that combines SlowFast-based action recognition with PPE detectors (RetinaNet/Fast R-CNN/YOLOv9) and a compliance module. Empirical results show a $23\%$ improvement in F1-score on a 109-video test for the action-aware PPE detection, with high clip-level recall ($\approx0.93$) when using multi-frame action context, and strong human-vs-model comparison favoring the proposed approach in recall. The proposed approach demonstrates real-time viability (roughly $117\text{ ms}$ per frame) and scalability to multiple streams, offering practical impact for automated safety monitoring on shop floors, while highlighting limitations of 2D RGB data and plans to incorporate depth sensing for improved spatial reasoning.
Abstract
Proper use of personal protective equipment (PPE) can save the lives of industry workers and it is a widely used application of computer vision in the large manufacturing industries. However, most of the applications deployed generate a lot of false alarms (violations) because they tend to generalize the requirements of PPE across the industry and tasks. The key to resolving this issue is to understand the action being performed by the worker and customize the inference for the specific PPE requirements of that action. In this paper, we propose a system that employs activity recognition models to first understand the action being performed and then use object detection techniques to check for violations. This leads to a 23% improvement in the F1-score compared to the PPE-based approach on our test dataset of 109 videos.
