Table of Contents
Fetching ...

Visible and Clear: Finding Tiny Objects in Difference Map

Bing Cao, Haiyu Yao, Pengfei Zhu, Qinghua Hu

TL;DR

The paper tackles the challenge of tiny object detection by addressing information loss in backbone feature extraction. It introduces SR-TOD, which inserts a reconstruction head to produce a pixel-wise difference map $D$ via $I_r = \sigma(Conv(Up(Up(P2))))$ and $D = \text{Mean}_{channel}(\lvert I_r - I_o \rvert)$, using this prior in a Difference Map Guided Feature Enhancement to obtain $P2'$. A DGFE module computes channel-wise weights from the difference map and $P2$, yielding $P2' = M \otimes P2 = (Reweighting(P2) \otimes Filtration(D)) \otimes P2$, which improves tiny-object visibility across detectors. Extensive experiments on DroneSwarms, VisDrone2019, and AI-TOD show consistent performance gains and ablations confirm the contributions, with the authors also introducing the DroneSwarms anti-UAV dataset for challenging tiny-object scenarios.

Abstract

Tiny object detection is one of the key challenges in the field of object detection. The performance of most generic detectors dramatically decreases in tiny object detection tasks. The main challenge lies in extracting effective features of tiny objects. Existing methods usually perform generation-based feature enhancement, which is seriously affected by spurious textures and artifacts, making it difficult to make the tiny-object-specific features visible and clear for detection. To address this issue, we propose a self-reconstructed tiny object detection (SR-TOD) framework. We for the first time introduce a self-reconstruction mechanism in the detection model, and discover the strong correlation between it and the tiny objects. Specifically, we impose a reconstruction head in-between the neck of a detector, constructing a difference map of the reconstructed image and the input, which shows high sensitivity to tiny objects. This inspires us to enhance the weak representations of tiny objects under the guidance of the difference maps. Thus, improving the visibility of tiny objects for the detectors. Building on this, we further develop a Difference Map Guided Feature Enhancement (DGFE) module to make the tiny feature representation more clear. In addition, we further propose a new multi-instance anti-UAV dataset, which is called DroneSwarms dataset and contains a large number of tiny drones with the smallest average size to date. Extensive experiments on the DroneSwarms dataset and other datasets demonstrate the effectiveness of the proposed method. The code and dataset will be publicly available.

Visible and Clear: Finding Tiny Objects in Difference Map

TL;DR

The paper tackles the challenge of tiny object detection by addressing information loss in backbone feature extraction. It introduces SR-TOD, which inserts a reconstruction head to produce a pixel-wise difference map via and , using this prior in a Difference Map Guided Feature Enhancement to obtain . A DGFE module computes channel-wise weights from the difference map and , yielding , which improves tiny-object visibility across detectors. Extensive experiments on DroneSwarms, VisDrone2019, and AI-TOD show consistent performance gains and ablations confirm the contributions, with the authors also introducing the DroneSwarms anti-UAV dataset for challenging tiny-object scenarios.

Abstract

Tiny object detection is one of the key challenges in the field of object detection. The performance of most generic detectors dramatically decreases in tiny object detection tasks. The main challenge lies in extracting effective features of tiny objects. Existing methods usually perform generation-based feature enhancement, which is seriously affected by spurious textures and artifacts, making it difficult to make the tiny-object-specific features visible and clear for detection. To address this issue, we propose a self-reconstructed tiny object detection (SR-TOD) framework. We for the first time introduce a self-reconstruction mechanism in the detection model, and discover the strong correlation between it and the tiny objects. Specifically, we impose a reconstruction head in-between the neck of a detector, constructing a difference map of the reconstructed image and the input, which shows high sensitivity to tiny objects. This inspires us to enhance the weak representations of tiny objects under the guidance of the difference maps. Thus, improving the visibility of tiny objects for the detectors. Building on this, we further develop a Difference Map Guided Feature Enhancement (DGFE) module to make the tiny feature representation more clear. In addition, we further propose a new multi-instance anti-UAV dataset, which is called DroneSwarms dataset and contains a large number of tiny drones with the smallest average size to date. Extensive experiments on the DroneSwarms dataset and other datasets demonstrate the effectiveness of the proposed method. The code and dataset will be publicly available.
Paper Structure (13 sections, 6 equations, 4 figures, 7 tables)

This paper contains 13 sections, 6 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Visualization of the image self-reconstruction mechanism. The results are from Cascade R-CNN cai2018cascade. a shows the entire image. To demonstrate the effect more clearly, b zooms in on the local region in the red box. c shows the visual heatmap of the corresponding region in the feature map of FPN P2. d is the reconstructed image. e is the difference map. The yellow dotted box highlights the tiny drone whose signal is almost wiped out in the feature map.
  • Figure 2: Overall model architecture. RH refers to the reconstruction head, DGFE is the difference map guided feature enhancement. $P2-P6$ denote the feature maps extracted from FPN. $P2'$ is the enhanced feature map. $M$ is the element-wise attention matrix.
  • Figure 3: Visualizations of the pixel difference maps and high-frequency difference maps.
  • Figure 4: Visualizations on DroneSwarms. The first row is the local regions of input images. The second and third rows are the corresponding feature maps from Cascade R-CNN $w/o$ and $w/$ SR-TOD.