Table of Contents
Fetching ...

HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection

Shibiao Xu, ShuChen Zheng, Wenhao Xu, Rongtao Xu, Changwei Wang, Jiguang Zhang, Xiaoqiang Teng, Ao Li, Li Guo

TL;DR

HCF-Net tackles infrared small object detection by preserving multi-scale context through a triad of modules: Parallelized Patch-Aware Attention (PPA) for multi-branch patch-based features, Dimension-Aware Selective Integration (DASI) for adaptive skip-connection fusion, and Multi-Dilated Channel Refiner (MDCR) for multi-scale channel refinement. Framing the task as semantic segmentation and trained from scratch, the approach emphasizes robust feature preservation and targeted fusion to combat small-object loss and background clutter. Experimental validation on the SIRST dataset demonstrates state-of-the-art IoU and nIoU performance, with ablations confirming the contributions of each module. The method offers practical advances for infrared surveillance and related domains where tiny targets must be detected in challenging backgrounds.

Abstract

Infrared small object detection is an important computer vision task involving the recognition and localization of tiny objects in infrared images, which usually contain only a few pixels. However, it encounters difficulties due to the diminutive size of the objects and the generally complex backgrounds in infrared images. In this paper, we propose a deep learning method, HCF-Net, that significantly improves infrared small object detection performance through multiple practical modules. Specifically, it includes the parallelized patch-aware attention (PPA) module, dimension-aware selective integration (DASI) module, and multi-dilated channel refiner (MDCR) module. The PPA module uses a multi-branch feature extraction strategy to capture feature information at different scales and levels. The DASI module enables adaptive channel selection and fusion. The MDCR module captures spatial features of different receptive field ranges through multiple depth-separable convolutional layers. Extensive experimental results on the SIRST infrared single-frame image dataset show that the proposed HCF-Net performs well, surpassing other traditional and deep learning models. Code is available at https://github.com/zhengshuchen/HCFNet.

HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection

TL;DR

HCF-Net tackles infrared small object detection by preserving multi-scale context through a triad of modules: Parallelized Patch-Aware Attention (PPA) for multi-branch patch-based features, Dimension-Aware Selective Integration (DASI) for adaptive skip-connection fusion, and Multi-Dilated Channel Refiner (MDCR) for multi-scale channel refinement. Framing the task as semantic segmentation and trained from scratch, the approach emphasizes robust feature preservation and targeted fusion to combat small-object loss and background clutter. Experimental validation on the SIRST dataset demonstrates state-of-the-art IoU and nIoU performance, with ablations confirming the contributions of each module. The method offers practical advances for infrared surveillance and related domains where tiny targets must be detected in challenging backgrounds.

Abstract

Infrared small object detection is an important computer vision task involving the recognition and localization of tiny objects in infrared images, which usually contain only a few pixels. However, it encounters difficulties due to the diminutive size of the objects and the generally complex backgrounds in infrared images. In this paper, we propose a deep learning method, HCF-Net, that significantly improves infrared small object detection performance through multiple practical modules. Specifically, it includes the parallelized patch-aware attention (PPA) module, dimension-aware selective integration (DASI) module, and multi-dilated channel refiner (MDCR) module. The PPA module uses a multi-branch feature extraction strategy to capture feature information at different scales and levels. The DASI module enables adaptive channel selection and fusion. The MDCR module captures spatial features of different receptive field ranges through multiple depth-separable convolutional layers. Extensive experimental results on the SIRST infrared single-frame image dataset show that the proposed HCF-Net performs well, surpassing other traditional and deep learning models. Code is available at https://github.com/zhengshuchen/HCFNet.
Paper Structure (16 sections, 8 equations, 5 figures, 2 tables)

This paper contains 16 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Network Architecture. The encoder primarily comprises the parallelized patch-aware attention (PPA) module and max-pooling layers, while the decoder mainly consists of PPA and convolutional transpose (CT) layers. We incorporate the multi-dilated channel refiner (MDCR) module as an intermediary layer to bridge the encoder and decoder. Within the skip-connection component, we introduce the dimension-aware selective integration (DASI) module to enhance the fusion and propagation of features across different network layers.
  • Figure 2: Detailed structure of the parallelized patch-aware attention module. This module primarily consists of two components: multi-branch fusion and attention mechanisms. The multi-branch fusion component includes patch-aware and concatenated convolutions. The 'p' parameter in patch-aware is set to 2 and 4, representing local and global branches, respectively.
  • Figure 3: Detail structure of the dimension-aware selective integration module.
  • Figure 4: Detail structure of multi-dilated channel refiner module.
  • Figure 5: Visual examples of representative methods are provided. Pink and green circles represent true positive and false positive objects, respectively. The pink rectangle zooms in on true positive objects for a more apparent distinction of detection accuracy among different methods.