Table of Contents
Fetching ...

A Flying Bird Object Detection Method for Surveillance Video

Ziwei Sun, Zexi Hua, Hengchao Li, Yan Li

TL;DR

This work tackles flying bird detection in surveillance video, where single-frame features are often weak, birds are small, and bounding boxes are irregular. It introduces Co-Attention-FA to fuse spatio-temporal cues across consecutive frames, and a large-feature FBOD-Net that down-samples and up-samples to create a rich, single-scale representation suitable for small objects. To address irregular box shapes, it proposes SimOTA-OC, a dynamic one-category label allocator based on IOU, paired with a tailored loss that omits a category term. Experiments on traction-substation videos show that FBOD-SV achieves state-of-the-art AP50 (≈0.762) and real-time speed (~59.9 fps), validating the effectiveness of multi-frame feature aggregation and dynamic labeling for small, irregular flying birds in surveillance contexts.

Abstract

Aiming at the specific characteristics of flying bird objects in surveillance video, such as the typically non-obvious features in single-frame images, small size in most instances, and asymmetric shapes, this paper proposes a Flying Bird Object Detection method for Surveillance Video (FBOD-SV). Firstly, a new feature aggregation module, the Correlation Attention Feature Aggregation (Co-Attention-FA) module, is designed to aggregate the features of the flying bird object according to the bird object's correlation on multiple consecutive frames of images. Secondly, a Flying Bird Object Detection Network (FBOD-Net) with down-sampling followed by up-sampling is designed, which utilizes a large feature layer that fuses fine spatial information and large receptive field information to detect special multi-scale (mostly small-scale) bird objects. Finally, the SimOTA dynamic label allocation method is applied to One-Category object detection, and the SimOTA-OC dynamic label strategy is proposed to solve the difficult problem of label allocation caused by irregular flying bird objects. In this paper, the performance of the FBOD-SV is validated using experimental datasets of flying bird objects in traction substation surveillance videos. The experimental results show that the FBOD-SV effectively improves the detection performance of flying bird objects in surveillance video.

A Flying Bird Object Detection Method for Surveillance Video

TL;DR

This work tackles flying bird detection in surveillance video, where single-frame features are often weak, birds are small, and bounding boxes are irregular. It introduces Co-Attention-FA to fuse spatio-temporal cues across consecutive frames, and a large-feature FBOD-Net that down-samples and up-samples to create a rich, single-scale representation suitable for small objects. To address irregular box shapes, it proposes SimOTA-OC, a dynamic one-category label allocator based on IOU, paired with a tailored loss that omits a category term. Experiments on traction-substation videos show that FBOD-SV achieves state-of-the-art AP50 (≈0.762) and real-time speed (~59.9 fps), validating the effectiveness of multi-frame feature aggregation and dynamic labeling for small, irregular flying birds in surveillance contexts.

Abstract

Aiming at the specific characteristics of flying bird objects in surveillance video, such as the typically non-obvious features in single-frame images, small size in most instances, and asymmetric shapes, this paper proposes a Flying Bird Object Detection method for Surveillance Video (FBOD-SV). Firstly, a new feature aggregation module, the Correlation Attention Feature Aggregation (Co-Attention-FA) module, is designed to aggregate the features of the flying bird object according to the bird object's correlation on multiple consecutive frames of images. Secondly, a Flying Bird Object Detection Network (FBOD-Net) with down-sampling followed by up-sampling is designed, which utilizes a large feature layer that fuses fine spatial information and large receptive field information to detect special multi-scale (mostly small-scale) bird objects. Finally, the SimOTA dynamic label allocation method is applied to One-Category object detection, and the SimOTA-OC dynamic label strategy is proposed to solve the difficult problem of label allocation caused by irregular flying bird objects. In this paper, the performance of the FBOD-SV is validated using experimental datasets of flying bird objects in traction substation surveillance videos. The experimental results show that the FBOD-SV effectively improves the detection performance of flying bird objects in surveillance video.
Paper Structure (22 sections, 6 equations, 12 figures, 6 tables)

This paper contains 22 sections, 6 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Characteristics of flying bird objects in surveillance videos. (a) On the right, there is a small bird with weak features. Left is a screenshot of the bird on five consecutive frames. (b) Simplified diagram of the distribution of flying birds in surveillance video. Birds are evenly distributed in the surveillance area (A, B, and C areas: birds in area A are large objects, birds in area B are general objects, and birds in area C are small objects). However, the space of area C is much larger than that of areas A and B, so birds are mostly small objects. (c) In the right image, the green box is the bounding box, and the red point is the middle of the bounding box. Due to the shape of the bird object itself, it is not regular in the bounding box in most cases.
  • Figure 2: Overview of the proposed FBOD-SV. (a) The Co-Attention-FA unit. (b) The FBOD-Net model. (c) The SimerOTA-OC dynamic label assignment unit. (d) Model training unit (loss function).
  • Figure 3: Diagram of the Co-Attention-FA module.
  • Figure 4: The FBOD-Net.
  • Figure 5: Feature fusion module in FBOD-Net.
  • ...and 7 more figures