Table of Contents
Fetching ...

Rethinking Generalizable Infrared Small Target Detection: A Real-scene Benchmark and Cross-view Representation Learning

Yahao Lu, Yuehui Li, Xingyuan Guo, Shuai Yuan, Yukai Shi, Liang Lin

TL;DR

The paper tackles domain shift in infrared small target detection (ISTD) by introducing a domain-adaptive framework that combines Cross-view Channel Alignment (CCA), Cross-view Top-K Fusion, and Noise-guided Representation Learning. It formalizes cross-domain training to reduce distribution gaps, employing a gamma-corrected channel alignment $O_{Gray} = (\frac{I_{Gray}}{255})^{\frac{1}{\gamma_{Gray}}}$ and an SSIM-guided Top-KPatch fusion with Poisson blending, plus a noise-consistency loss $\mathcal{L}_{Noise} = \mathcal{L}_{MSE}(F_{Global}^{P}, F_{Global}^{N})$ alongside $\mathcal{L}_{BCE}$. A new RealScene-ISTD benchmark (540x420) aggregates NUAA-SIRST, IRSTD-1K, and 739 UAV infrared images across Tiny/Normal/Large targets to stress cross-domain generalization. Empirical results show improved $IoU$, $P_d$, and reduced $F_a$ across RealScene-ISTD and IRSTD-1K, with ROC curves confirming superior discrimination, demonstrating practical impact for real-world ISTD deployments.

Abstract

Infrared small target detection (ISTD) is highly sensitive to sensor type, observation conditions, and the intrinsic properties of the target. These factors can introduce substantial variations in the distribution of acquired infrared image data, a phenomenon known as domain shift. Such distribution discrepancies significantly hinder the generalization capability of ISTD models across diverse scenarios. To tackle this challenge, this paper introduces an ISTD framework enhanced by domain adaptation. To alleviate distribution shift between datasets and achieve cross-sample alignment, we introduce Cross-view Channel Alignment (CCA). Additionally, we propose the Cross-view Top-K Fusion strategy, which integrates target information with diverse background features, enhancing the model' s ability to extract critical data characteristics. To further mitigate the impact of noise on ISTD, we develop a Noise-guided Representation learning strategy. This approach enables the model to learn more noise-resistant feature representations, to improve its generalization capability across diverse noisy domains. Finally, we develop a dedicated infrared small target dataset, RealScene-ISTD. Compared to state-of-the-art methods, our approach demonstrates superior performance in terms of detection probability (Pd), false alarm rate (Fa), and intersection over union (IoU). The code is available at: https://github.com/luy0222/RealScene-ISTD.

Rethinking Generalizable Infrared Small Target Detection: A Real-scene Benchmark and Cross-view Representation Learning

TL;DR

The paper tackles domain shift in infrared small target detection (ISTD) by introducing a domain-adaptive framework that combines Cross-view Channel Alignment (CCA), Cross-view Top-K Fusion, and Noise-guided Representation Learning. It formalizes cross-domain training to reduce distribution gaps, employing a gamma-corrected channel alignment and an SSIM-guided Top-KPatch fusion with Poisson blending, plus a noise-consistency loss alongside . A new RealScene-ISTD benchmark (540x420) aggregates NUAA-SIRST, IRSTD-1K, and 739 UAV infrared images across Tiny/Normal/Large targets to stress cross-domain generalization. Empirical results show improved , , and reduced across RealScene-ISTD and IRSTD-1K, with ROC curves confirming superior discrimination, demonstrating practical impact for real-world ISTD deployments.

Abstract

Infrared small target detection (ISTD) is highly sensitive to sensor type, observation conditions, and the intrinsic properties of the target. These factors can introduce substantial variations in the distribution of acquired infrared image data, a phenomenon known as domain shift. Such distribution discrepancies significantly hinder the generalization capability of ISTD models across diverse scenarios. To tackle this challenge, this paper introduces an ISTD framework enhanced by domain adaptation. To alleviate distribution shift between datasets and achieve cross-sample alignment, we introduce Cross-view Channel Alignment (CCA). Additionally, we propose the Cross-view Top-K Fusion strategy, which integrates target information with diverse background features, enhancing the model' s ability to extract critical data characteristics. To further mitigate the impact of noise on ISTD, we develop a Noise-guided Representation learning strategy. This approach enables the model to learn more noise-resistant feature representations, to improve its generalization capability across diverse noisy domains. Finally, we develop a dedicated infrared small target dataset, RealScene-ISTD. Compared to state-of-the-art methods, our approach demonstrates superior performance in terms of detection probability (Pd), false alarm rate (Fa), and intersection over union (IoU). The code is available at: https://github.com/luy0222/RealScene-ISTD.

Paper Structure

This paper contains 15 sections, 10 equations, 9 figures, 9 tables, 2 algorithms.

Figures (9)

  • Figure 1: Our Noise-guided Representation Learning process. Compared to traditional ISTD work, we generate outlier features (red dots) in a noise-guided manner. By minimizing the feature space distance between $F^P_{Global}$ and $F^N_{Global}$, we guide the model to learn more essential feature representations from limited data (blue dots).
  • Figure 2: Illustration of the proposed generalizable infrared small target detection framework. (a) The Cross-view Channel Alignment and Top-K Poisson Fusion method aims to effectively expand external data and deeply explore internal latent information to address domain shift issues; (b) The feature fusion module establishes semantic connections at different levels through multi-scale feature interactions; (c) The Noise-guided Representation Learning strategy mitigates domain shift caused by noise variations in different infrared images, achieving higher generalization capability.
  • Figure 3: Cross-view Channel Alignment effectively improves the alignment of average pixel distributions between source and target domain images, thereby significantly enhancing their visual similarity.
  • Figure 4: Cross-view Top-K Poisson Fusion operates by scanning the image with a sliding window to identify local regions that closely match the target patch.. The top-matching regions are then selected and seamlessly blended using Poisson fusion.
  • Figure 5: Effectiveness of Cross-view Top-K Poisson Fusion. Unlike conventional image stitching techniques, Poisson fusion generates synthetic samples with greater visual authenticity.
  • ...and 4 more figures