Table of Contents
Fetching ...

HS-FPN: High Frequency and Spatial Perception FPN for Tiny Object Detection

Zican Shi, Jing Hu, Jie Ren, Hengkang Ye, Xuyang Yuan, Yan Ouyang, Jia He, Bo Ji, Junyu Guo

TL;DR

This work addresses the persistent challenge of tiny object detection by augmenting Feature Pyramid Networks with two novel modules: High Frequency Perception (HFP) and Spatial Dependency Perception (SDP). HFP enriches tiny-object features by applying a high-pass filter to extract high-frequency content and deriving channel- and spatial-wise attention from these high-frequency responses; SDP introduces pixel-level cross-attention between adjacent feature maps to capture spatial dependencies that standard FPNs miss. Ablation studies show that both modules individually improve detection and especially when combined, yielding substantial gains over FPN across multiple detectors on AI-TOD and DOTA-mini10. The approach delivers consistent improvements with a modest increase in computational cost, and can be easily embedded into existing FPN-based architectures, enhancing robustness for tiny-object scenarios with practical impact for aerial imagery and other TOD applications.

Abstract

The introduction of Feature Pyramid Network (FPN) has significantly improved object detection performance. However, substantial challenges remain in detecting tiny objects, as their features occupy only a very small proportion of the feature maps. Although FPN integrates multi-scale features, it does not directly enhance or enrich the features of tiny objects. Furthermore, FPN lacks spatial perception ability. To address these issues, we propose a novel High Frequency and Spatial Perception Feature Pyramid Network (HS-FPN) with two innovative modules. First, we designed a high frequency perception module (HFP) that generates high frequency responses through high pass filters. These high frequency responses are used as mask weights from both spatial and channel perspectives to enrich and highlight the features of tiny objects in the original feature maps. Second, we developed a spatial dependency perception module (SDP) to capture the spatial dependencies that FPN lacks. Our experiments demonstrate that detectors based on HS-FPN exhibit competitive advantages over state-of-the-art models on the AI-TOD dataset for tiny object detection.

HS-FPN: High Frequency and Spatial Perception FPN for Tiny Object Detection

TL;DR

This work addresses the persistent challenge of tiny object detection by augmenting Feature Pyramid Networks with two novel modules: High Frequency Perception (HFP) and Spatial Dependency Perception (SDP). HFP enriches tiny-object features by applying a high-pass filter to extract high-frequency content and deriving channel- and spatial-wise attention from these high-frequency responses; SDP introduces pixel-level cross-attention between adjacent feature maps to capture spatial dependencies that standard FPNs miss. Ablation studies show that both modules individually improve detection and especially when combined, yielding substantial gains over FPN across multiple detectors on AI-TOD and DOTA-mini10. The approach delivers consistent improvements with a modest increase in computational cost, and can be easily embedded into existing FPN-based architectures, enhancing robustness for tiny-object scenarios with practical impact for aerial imagery and other TOD applications.

Abstract

The introduction of Feature Pyramid Network (FPN) has significantly improved object detection performance. However, substantial challenges remain in detecting tiny objects, as their features occupy only a very small proportion of the feature maps. Although FPN integrates multi-scale features, it does not directly enhance or enrich the features of tiny objects. Furthermore, FPN lacks spatial perception ability. To address these issues, we propose a novel High Frequency and Spatial Perception Feature Pyramid Network (HS-FPN) with two innovative modules. First, we designed a high frequency perception module (HFP) that generates high frequency responses through high pass filters. These high frequency responses are used as mask weights from both spatial and channel perspectives to enrich and highlight the features of tiny objects in the original feature maps. Second, we developed a spatial dependency perception module (SDP) to capture the spatial dependencies that FPN lacks. Our experiments demonstrate that detectors based on HS-FPN exhibit competitive advantages over state-of-the-art models on the AI-TOD dataset for tiny object detection.

Paper Structure

This paper contains 20 sections, 3 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Target region after filtering out low-frequency components across different ranges (top), 3D surface representations (down), and the SCR of the target (marked in red text). (a) The original image containing a tiny ship target; (b) The target neighborhood image without filtering; (c - d) Filtering out low-frequency components in the top-left corner of the DCT results, from the $0\times0$ to $100\times100$ region. It shows that the SCR first increases and then decreases as the filtering range is expanded.
  • Figure 2: The overall structure of HS-FPN and the details of the lateral connection. HFP means high frequency perception module while SDP means spatial dependency perception module. For better visualization, convolution layers are omitted. Please note that SDP dose not appear in ${P_{5}}$.
  • Figure 3: The design ideas of HFP, which consists of a high frequency feature generator, a channel path (CP), and a spatial path (SP). Please note that the high frequency feature extractor within the dashed box is implemented only in the {${P_{2}, P_{3}}$} and the two high frequency features in the diagram are identical. means Hadamard product, means feature cognate and denotes pixel-by-pixel summation. $\sigma$ refers ReLU function.
  • Figure 4: Illustration of High-Pass Filter.
  • Figure 5: Computing details between different attention methods. (a) Lack of spatial dependency in FPN; (b) Attention between pixels used in SDP. (c) Attention between feature blocks used in ViT.
  • ...and 12 more figures