Learning Dynamic Local Context Representations for Infrared Small Target Detection
Guoyi Zhang, Guangsheng Xu, Han Wang, Siyang Chen, Yunxiao Shan, Xiaohu Zhang
TL;DR
Infrared small target detection (ISTD) is challenged by clutter, low signal-to-clutter ratios, and scale variation. The authors propose LCRNet, a lightweight U-Net–like model that learns dynamic local context representations through three components: Coarse-to-fine Convolution Block (C2FBlock), Dynamic Local Context Attention (DLC-Attention), and HLKConv for efficient large-kernel processing. Through a multigrid-inspired refinement, adaptive receptive-field allocation, and hierarchical large-kernel decomposition, LCRNet achieves state-of-the-art results on IRSTD-1K, SIRSTAUG, and NUDT-SIRST with only 1.65M parameters and low computational cost. Ablation studies validate the contributions of each component, and results indicate robust performance and practical efficiency, suggesting strong potential for real-time ISTD with further optimizations.
Abstract
Infrared small target detection (ISTD) is challenging due to complex backgrounds, low signal-to-clutter ratios, and varying target sizes and shapes. Effective detection relies on capturing local contextual information at the appropriate scale. However, small-kernel CNNs have limited receptive fields, leading to false alarms, while transformer models, with global receptive fields, often treat small targets as noise, resulting in miss-detections. Hybrid models struggle to bridge the semantic gap between CNNs and transformers, causing high complexity.To address these challenges, we propose LCRNet, a novel method that learns dynamic local context representations for ISTD. The model consists of three components: (1) C2FBlock, inspired by PDE solvers, for efficient small target information capture; (2) DLC-Attention, a large-kernel attention mechanism that dynamically builds context and reduces feature redundancy; and (3) HLKConv, a hierarchical convolution operator based on large-kernel decomposition that preserves sparsity and mitigates the drawbacks of dilated convolutions. Despite its simplicity, with only 1.65M parameters, LCRNet achieves state-of-the-art (SOTA) performance.Experiments on multiple datasets, comparing LCRNet with 33 SOTA methods, demonstrate its superior performance and efficiency.
