Table of Contents
Fetching ...

Lost in UNet: Improving Infrared Small Target Detection by Underappreciated Local Features

Wuzhou Quan, Wei Zhao, Weiming Wang, Haoran Xie, Fu Lee Wang, Mingqiang Wei

TL;DR

HintU is proposed, a novel network to recover the local features lost by various UNet-based methods for effective ISTD and introduces the “Hint” mechanism for the first time, i.e., leveraging the prior knowledge of target locations to highlight critical local features.

Abstract

Many targets are often very small in infrared images due to the long-distance imaging meachnism. UNet and its variants, as popular detection backbone networks, downsample the local features early and cause the irreversible loss of these local features, leading to both the missed and false detection of small targets in infrared images. We propose HintU, a novel network to recover the local features lost by various UNet-based methods for effective infrared small target detection. HintU has two key contributions. First, it introduces the "Hint" mechanism for the first time, i.e., leveraging the prior knowledge of target locations to highlight critical local features. Second, it improves the mainstream UNet-based architecture to preserve target pixels even after downsampling. HintU can shift the focus of various networks (e.g., vanilla UNet, UNet++, UIUNet, MiM+, and HCFNet) from the irrelevant background pixels to a more restricted area from the beginning. Experimental results on three datasets NUDT-SIRST, SIRSTv2 and IRSTD1K demonstrate that HintU enhances the performance of existing methods with only an additional 1.88 ms cost (on RTX Titan). Additionally, the explicit constraints of HintU enhance the generalization ability of UNet-based methods. Code is available at https://github.com/Wuzhou-Quan/HintU.

Lost in UNet: Improving Infrared Small Target Detection by Underappreciated Local Features

TL;DR

HintU is proposed, a novel network to recover the local features lost by various UNet-based methods for effective ISTD and introduces the “Hint” mechanism for the first time, i.e., leveraging the prior knowledge of target locations to highlight critical local features.

Abstract

Many targets are often very small in infrared images due to the long-distance imaging meachnism. UNet and its variants, as popular detection backbone networks, downsample the local features early and cause the irreversible loss of these local features, leading to both the missed and false detection of small targets in infrared images. We propose HintU, a novel network to recover the local features lost by various UNet-based methods for effective infrared small target detection. HintU has two key contributions. First, it introduces the "Hint" mechanism for the first time, i.e., leveraging the prior knowledge of target locations to highlight critical local features. Second, it improves the mainstream UNet-based architecture to preserve target pixels even after downsampling. HintU can shift the focus of various networks (e.g., vanilla UNet, UNet++, UIUNet, MiM+, and HCFNet) from the irrelevant background pixels to a more restricted area from the beginning. Experimental results on three datasets NUDT-SIRST, SIRSTv2 and IRSTD1K demonstrate that HintU enhances the performance of existing methods with only an additional 1.88 ms cost (on RTX Titan). Additionally, the explicit constraints of HintU enhance the generalization ability of UNet-based methods. Code is available at https://github.com/Wuzhou-Quan/HintU.
Paper Structure (17 sections, 11 equations, 11 figures, 4 tables)

This paper contains 17 sections, 11 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: By visualizing the residual between the simple maximum pooling and the original image, we observe that the targets submerged in the high response gain a more distinct pattern. This newly acquired pattern is widely applicable, as evidenced by several visualizations portraying challenging scenarios. To augment the visual clarity, we have selectively magnified regions highlighted in red and blue. (a) The target is submerged in low-contrast surroundings, with the image exhibiting a globally high response value difference. (b) The distribution of image intensities appears chaotic, with an uneven distribution of strong response points. The target registers a response value lower than the global maximum. (c) The global response is relative uniformity, featuring numerous targets with different degrees of distinguishability.
  • Figure 2: An overview of the proposed HintU for infrared small target detection. It consists of two primary modules: a prefixed Hint network, which generates representations based on hints derived from the original image, and a UNet-like network tasked with inferring the final results.
  • Figure 3: Several UNet-like network structures.
  • Figure 4: An overview of the HintO for infrared small target detection. In comparison to HintU, HintO completely excludes the original image when generating Hint representations, facilitating a direct assessment of the effectiveness of the Hint concept itself. For the enhanced visibility, highlighted areas are enlarged.
  • Figure 5: Summarized information of the three datasets, including (a) the ratio of the number of global high-response pixels to the total number of pixels (b) the ratio of the pixel count of each target to the total number of pixels (c) the count of targets in each frame. The x-axis denotes the value. The y-axis shows the percentage of total.
  • ...and 6 more figures