It's Not the Target, It's the Background: Rethinking Infrared Small Target Detection via Deep Patch-Free Low-Rank Representations
Guoyi Zhang, Guangsheng Xu, Siyang Chen, Han Wang, Xiaohu Zhang
TL;DR
This work tackles infrared small target detection in cluttered scenes by addressing low SCR and target variability. It introduces LRRNet, a patch-free network that directly learns image-domain low-rank background priors through a compression–reconstruction–subtraction framework, preserving small-target cues while avoiding patch-based RPCA. The model employs a densely connected encoder–decoder with a single low-resolution self-attention and a subtraction-based target isolation, optimized by segmentation and reconstruction losses. Experiments on IRSTD-1K, SIRSTAUG, and NUDT-SIRST show state-of-the-art or competitive accuracy with real-time performance (83 FPS range) and robustness to sensor noise, highlighting the practical potential of interpretable low-rank priors for infrared small-target detection.
Abstract
\textcolor{blue}{This is the pre-acceptance version, to read the final version please go to \href{https://ieeexplore.ieee.org/document/11156113}{IEEE Transactions on Geoscience and Remote Sensing on IEEE Xplore}.} Infrared small target detection (IRSTD) remains a long-standing challenge in complex backgrounds due to low signal-to-clutter ratios (SCR), diverse target morphologies, and the absence of distinctive visual cues. While recent deep learning approaches aim to learn discriminative representations, the intrinsic variability and weak priors of small targets often lead to unstable performance. In this paper, we propose a novel end-to-end IRSTD framework, termed LRRNet, which leverages the low-rank property of infrared image backgrounds. Inspired by the physical compressibility of cluttered scenes, our approach adopts a compression--reconstruction--subtraction (CRS) paradigm to directly model structure-aware low-rank background representations in the image domain, without relying on patch-based processing or explicit matrix decomposition. To the best of our knowledge, this is the first work to directly learn low-rank background structures using deep neural networks in an end-to-end manner. Extensive experiments on multiple public datasets demonstrate that LRRNet outperforms 38 state-of-the-art methods in terms of detection accuracy, robustness, and computational efficiency. Remarkably, it achieves real-time performance with an average speed of 82.34 FPS. Evaluations on the challenging NoisySIRST dataset further confirm the model's resilience to sensor noise. The source code will be made publicly available upon acceptance.
