Table of Contents
Fetching ...

Dynamic Attention and Bi-directional Fusion for Safety Helmet Wearing Detection

Junwei Feng, Xueyan Fan, Yuyang Chen, Yi Li

TL;DR

The paper tackles real-time safety helmet wearing detection in cluttered construction environments where helmets are small and frequently occluded. It introduces DABFNet, combining a Dynamic Attention Detection Head (DAHead), a Bi-directional Weighted Feature Pyramid Network (BWFPN), and Wise-IoU loss (WIoU) to enhance multi-scale feature fusion and object localization. Experimental results on the SHWD dataset show improved accuracy (notably mAP@0.5:0.95) and reduced computational load, with ablations confirming the effectiveness of each component. This approach offers a practical, edge-friendly solution for construction-site safety monitoring with potential impact on real-world helmet compliance enforcement.

Abstract

Ensuring construction site safety requires accurate and real-time detection of workers' safety helmet use, despite challenges posed by cluttered environments, densely populated work areas, and hard-to-detect small or overlapping objects caused by building obstructions. This paper proposes a novel algorithm for safety helmet wearing detection, incorporating a dynamic attention within the detection head to enhance multi-scale perception. The mechanism combines feature-level attention for scale adaptation, spatial attention for spatial localization, and channel attention for task-specific insights, improving small object detection without additional computational overhead. Furthermore, a two-way fusion strategy enables bidirectional information flow, refining feature fusion through adaptive multi-scale weighting, and enhancing recognition of occluded targets. Experimental results demonstrate a 1.7% improvement in mAP@[.5:.95] compared to the best baseline while reducing GFLOPs by 11.9% on larger sizes. The proposed method surpasses existing models, providing an efficient and practical solution for real-world construction safety monitoring.

Dynamic Attention and Bi-directional Fusion for Safety Helmet Wearing Detection

TL;DR

The paper tackles real-time safety helmet wearing detection in cluttered construction environments where helmets are small and frequently occluded. It introduces DABFNet, combining a Dynamic Attention Detection Head (DAHead), a Bi-directional Weighted Feature Pyramid Network (BWFPN), and Wise-IoU loss (WIoU) to enhance multi-scale feature fusion and object localization. Experimental results on the SHWD dataset show improved accuracy (notably mAP@0.5:0.95) and reduced computational load, with ablations confirming the effectiveness of each component. This approach offers a practical, edge-friendly solution for construction-site safety monitoring with potential impact on real-world helmet compliance enforcement.

Abstract

Ensuring construction site safety requires accurate and real-time detection of workers' safety helmet use, despite challenges posed by cluttered environments, densely populated work areas, and hard-to-detect small or overlapping objects caused by building obstructions. This paper proposes a novel algorithm for safety helmet wearing detection, incorporating a dynamic attention within the detection head to enhance multi-scale perception. The mechanism combines feature-level attention for scale adaptation, spatial attention for spatial localization, and channel attention for task-specific insights, improving small object detection without additional computational overhead. Furthermore, a two-way fusion strategy enables bidirectional information flow, refining feature fusion through adaptive multi-scale weighting, and enhancing recognition of occluded targets. Experimental results demonstrate a 1.7% improvement in mAP@[.5:.95] compared to the best baseline while reducing GFLOPs by 11.9% on larger sizes. The proposed method surpasses existing models, providing an efficient and practical solution for real-world construction safety monitoring.

Paper Structure

This paper contains 20 sections, 8 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: DABFNet model framework.The Backbone is responsible for feature extraction through successive convolutional and C2f layers, culminating in a Spatial Pyramid Pooling Fast (SPPF) module that consolidates multi-scale information. The Neck integrates features using the BWFPN and upsampling modules. It connects multiple C2f modules and BWFPN layers to enhance feature representation and improve the network’s detection capability. The Head includes a series of DAHead Blocks that predict bounding boxes and classify objects.
  • Figure 2: Dynamic Attention Detection Head Block. The Dynamic Attention Head Block comprises three components: $\pi_{L}$, responsible for local feature processing; $\pi_{S}$, which refines spatial features through convolution and offset adjustments; and $\pi_{C}$ focused on channel-wise feature modulation.
  • Figure 3: Comparison of different feature fusion methods. (a) FPN, which uses a simple top-down pathway for multi-scale feature fusion; (b) PANet, which adds a bottom-up pathway to enhance information flow; (c) NAS-FPN, which leverages neural architecture search to create an optimized, multi-level feature fusion structure with repeated blocks; (d) our BWFPN, which incorporates bidirectional connections and repeated blocks for efficient and adaptive feature fusion across different scales.
  • Figure 4: Dataset label distribution. Fig.(a) shows the instance counts of each label, with "person" significantly outnumbering "hat"; Fig.(b) presents the bounding box distribution, revealing the common positions and sizes of annotated objects; Fig.(c) indicates the spatial distribution of bounding boxes on the x and y axes; Fig.(d) displays the relationship between bounding box width and height, suggesting variations in object sizes within the dataset.
  • Figure 5: Comparison of models of different sizes. This figure presents a comparison of models of different sizes based on mAP@0.5 and GFLOPs. The DABFNet model (in red) consistently outperforms YOLOv8 (in green) across all sizes, achieving higher mAP@0.5 values at comparable computational costs.
  • ...and 4 more figures