Table of Contents
Fetching ...

Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery

Fan Zhang, Lingling Li, Licheng Jiao, Xu Liu, Fang Liu, Shuyuan Yang, Biao Hou

TL;DR

This article designs a knowledge discovery network (KDN) to implement the renormalization group theory in terms of efficient feature extraction (FE), and abstracts a class of RCs with different connection strengths, called $n21$ C, and generalize it to feature pyramid network (FPN)-based multibranch detectors.

Abstract

Satellite imagery, due to its long-range imaging, brings with it a variety of scale-preferred tasks, such as the detection of tiny/small objects, making the precise localization and detection of small objects of interest a challenging task. In this article, we design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction. Renormalized connection (RC) on the KDN enables ``synergistic focusing'' of multi-scale features. Based on our observations of KDN, we abstract a class of RCs with different connection strengths, called n21C, and generalize it to FPN-based multi-branch detectors. In a series of FPN experiments on the scale-preferred tasks, we found that the ``divide-and-conquer'' idea of FPN severely hampers the detector's learning in the right direction due to the large number of large-scale negative samples and interference from background noise. Moreover, these negative samples cannot be eliminated by the focal loss function. The RCs extends the multi-level feature's ``divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks, and enables synergistic effects of multi-level features on the specific learning goal. In addition, interference activations in two aspects are greatly reduced and the detector learns in a more correct direction. Extensive experiments of 17 well-designed detection architectures embedded with n21s on five different levels of scale-preferred tasks validate the effectiveness and efficiency of the RCs. Especially the simplest linear form of RC, E421C performs well in all tasks and it satisfies the scaling property of RGT. We hope that our approach will transfer a large number of well-designed detectors from the computer vision community to the remote sensing community.

Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery

TL;DR

This article designs a knowledge discovery network (KDN) to implement the renormalization group theory in terms of efficient feature extraction (FE), and abstracts a class of RCs with different connection strengths, called C, and generalize it to feature pyramid network (FPN)-based multibranch detectors.

Abstract

Satellite imagery, due to its long-range imaging, brings with it a variety of scale-preferred tasks, such as the detection of tiny/small objects, making the precise localization and detection of small objects of interest a challenging task. In this article, we design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction. Renormalized connection (RC) on the KDN enables ``synergistic focusing'' of multi-scale features. Based on our observations of KDN, we abstract a class of RCs with different connection strengths, called n21C, and generalize it to FPN-based multi-branch detectors. In a series of FPN experiments on the scale-preferred tasks, we found that the ``divide-and-conquer'' idea of FPN severely hampers the detector's learning in the right direction due to the large number of large-scale negative samples and interference from background noise. Moreover, these negative samples cannot be eliminated by the focal loss function. The RCs extends the multi-level feature's ``divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks, and enables synergistic effects of multi-level features on the specific learning goal. In addition, interference activations in two aspects are greatly reduced and the detector learns in a more correct direction. Extensive experiments of 17 well-designed detection architectures embedded with n21s on five different levels of scale-preferred tasks validate the effectiveness and efficiency of the RCs. Especially the simplest linear form of RC, E421C performs well in all tasks and it satisfies the scaling property of RGT. We hope that our approach will transfer a large number of well-designed detectors from the computer vision community to the remote sensing community.
Paper Structure (38 sections, 12 equations, 14 figures, 14 tables)

This paper contains 38 sections, 12 equations, 14 figures, 14 tables.

Figures (14)

  • Figure 1: The saliency maps of YOLOv8-PAFPN describe the importance of features. The FPN series of extractors with focal loss are unable to adapt to difficult scale-preferred tasks. The initial "divide-and-conquer" mechanism for different pyramid levels brings a large number of interfering activations (areas with red activation values) for tiny object detection tasks. Fig. \ref{['heatmap_comparison']} shows the saliency maps of the three pyramid level feature activations of the feature extractor PAFPN for two types of scale preference tasks. (a) shows a purely tiny object detection task, we can see that the $P_4$ and $P_5$ level feature maps highlight the blockbuster background objects that have similar visual appearances (color and shape) but with different scales and object classes (in red boxes). This will greatly overwhelm the detection focus of $P_3$. (b) shows a small object detection task, where the $P_3$ and $P_4$ level all focus on the same-sized objects which ought to be detected by $P_4$. The $P_5$ level, however, is more concerned with the large irrelevant regions in the background. All the observations suggest that the signal-to-noise ratio of the FPNs used for the scale-preferred tasks is lower than the scale-diversified tasks because the huge amount of interfering activations overwhelmingly prevent the previous levels ((a) $P_3$, (b) $P_3$&$P_4$) from focusing on small objects as the objective.
  • Figure 2: Three types of feature connection methods. (a) Residual connection is designed to deepen the network by introducing a non-linear computation after summating the input and building block features, i.e., the ReLU unit. (b) gathers the design structures of different feature extractors with different connection layers and connection densities such as GiraffeDet jiang2022giraffedet, RTMDet lyu2022rtmdet, NAS-FPN ghiasi2019fpn, BiFPN tan2020efficientdet and GFPN xiao2023global. Lateral connection fpn is an independent connection module including summation or concatenation, ReLU layer, $3\times3$ convolution layer, and batch normalization layer. In addition to lateral connections for feature maps of the same size, there are also connections for feature maps of different sizes (indicated by light blue arrows). (c) shows the Renormalized Connection (n21Cs), which is used to renormalize the output of the feature extractor and form a new input to the head net. This is an example of Economic $n21$C, embedded in Feature Level 1 only. It only uses the output features of the designed feature extractor as inputs and does not change the internal connection structure of the feature extractors. Moreover, this kind of Renormalized Connection is a linear connection structure and does not contain additional non-linear operations or any learnable layer (except for the projection operation) and normalization layers yet works well.
  • Figure 3: The framework of the Knowledge Discovery Network (KDN).
  • Figure 4: A set of independent feature bases in a given feature space.
  • Figure 5: The architecture of the detectors embedded with the Renormalized Connection (economical form). Note that the feature pyramid networks have different connection layers and densities as shown in Fig. \ref{['comparison']}(b). For simplicity, the fine structure of the feature extractor is not shown. Level 4 of Swin Transformer and ResNet are not shown for simplicity.
  • ...and 9 more figures