Table of Contents
Fetching ...

Neural Spatial-Temporal Tensor Representation for Infrared Small Target Detection

Fengyi Wu, Simin Liu, Haoan Wang, Bingjie Tao, Junhai Luo, Zhenming Peng

TL;DR

This work tackles infrared small target detection by linking optimization-based background modeling with neural implicit representations in a unified unsupervised framework. It introduces NeurSTT, which represents the background as a neural-revealed low-rank tensor $oldsymbol{ ext{B}}^{ ext{NLR}} = [oldsymbol{ ext{G}}; f_{ heta_h}, f_{ heta_w}, f_{ heta_t}]$, and enforces temporal-spatial regularity via a novel neural 3-D total variation (Neur3DTV). The method jointly estimates the background and targets through an alternating minimization that combines nuclear-norm, sparsity, and Neur3DTV losses, with targets recovered by a soft-thresholding step and all parameters updated by Adam. Empirical results on multiple ISTD datasets show that NeurSTT achieves higher IoU and F1 scores with significantly fewer parameters (up to 16.6× fewer) and stronger background suppression, demonstrating robust, unsupervised target detection in challenging scenes. The approach highlights the value of domain-informed neural representations for tensor-based background modeling in video ISTD, with potential applicability to other low-rank + sparse detection tasks.

Abstract

Optimization-based approaches dominate infrared small target detection as they leverage infrared imagery's intrinsic low-rankness and sparsity. While effective for single-frame images, they struggle with dynamic changes in multi-frame scenarios as traditional spatial-temporal representations often fail to adapt. To address these challenges, we introduce a Neural-represented Spatial-Temporal Tensor (NeurSTT) model. This framework employs nonlinear networks to enhance spatial-temporal feature correlations in background approximation, thereby supporting target detection in an unsupervised manner. Specifically, we employ neural layers to approximate sequential backgrounds within a low-rank informed deep scheme. A neural three-dimensional total variation is developed to refine background smoothness while reducing static target-like clusters in sequences. Traditional sparsity constraints are incorporated into the loss functions to preserve potential targets. By replacing complex solvers with a deep updating strategy, NeurSTT simplifies the optimization process in a domain-awareness way. Visual and numerical results across various datasets demonstrate that our method outperforms detection challenges. Notably, it has 16.6$\times$ fewer parameters and averaged 19.19\% higher in $IoU$ compared to the suboptimal method on $256 \times 256$ sequences.

Neural Spatial-Temporal Tensor Representation for Infrared Small Target Detection

TL;DR

This work tackles infrared small target detection by linking optimization-based background modeling with neural implicit representations in a unified unsupervised framework. It introduces NeurSTT, which represents the background as a neural-revealed low-rank tensor , and enforces temporal-spatial regularity via a novel neural 3-D total variation (Neur3DTV). The method jointly estimates the background and targets through an alternating minimization that combines nuclear-norm, sparsity, and Neur3DTV losses, with targets recovered by a soft-thresholding step and all parameters updated by Adam. Empirical results on multiple ISTD datasets show that NeurSTT achieves higher IoU and F1 scores with significantly fewer parameters (up to 16.6× fewer) and stronger background suppression, demonstrating robust, unsupervised target detection in challenging scenes. The approach highlights the value of domain-informed neural representations for tensor-based background modeling in video ISTD, with potential applicability to other low-rank + sparse detection tasks.

Abstract

Optimization-based approaches dominate infrared small target detection as they leverage infrared imagery's intrinsic low-rankness and sparsity. While effective for single-frame images, they struggle with dynamic changes in multi-frame scenarios as traditional spatial-temporal representations often fail to adapt. To address these challenges, we introduce a Neural-represented Spatial-Temporal Tensor (NeurSTT) model. This framework employs nonlinear networks to enhance spatial-temporal feature correlations in background approximation, thereby supporting target detection in an unsupervised manner. Specifically, we employ neural layers to approximate sequential backgrounds within a low-rank informed deep scheme. A neural three-dimensional total variation is developed to refine background smoothness while reducing static target-like clusters in sequences. Traditional sparsity constraints are incorporated into the loss functions to preserve potential targets. By replacing complex solvers with a deep updating strategy, NeurSTT simplifies the optimization process in a domain-awareness way. Visual and numerical results across various datasets demonstrate that our method outperforms detection challenges. Notably, it has 16.6 fewer parameters and averaged 19.19\% higher in compared to the suboptimal method on sequences.

Paper Structure

This paper contains 19 sections, 39 equations, 12 figures, 11 tables, 1 algorithm.

Figures (12)

  • Figure 1: Comparison of the existing spatial-temporal tensor schemes for ISTD and our NeurSTT with physic-informed (low rankness and sparsity) unsupervised learning strategy, while NeurSTT has fewer parameters and a faster execution time.
  • Figure 2: Schematic of a tensor function representation method in Eq. (\ref{['eq:lrtfr']}).
  • Figure 3: Overall procedure of NeurSTT model for ISTD. The background spatial-temporal tensor is represented by a neural function with a low-rank prior, $\mathcal{B}^{\text{NLR}}$, followed by 3-D neural total variation regularization, $\Psi_{Neur3DTV}$. The target tensor is determined using a soft-thresholding operator. Each module has a specific loss constraint, combined into an overall loss function, with parameters updated using deep strategies.
  • Figure 4: Example of different configuration performance on different epochs on Seq. 3, where Config. II is in gray, Config. III is in skyblue, and Config. IV (NeurSTT) is in purple. Notably, the proposed model can effectively reduce false alarms and preserve targets as the epoch increases.
  • Figure 5: Example of different nuclear norm loss $\mathcal{L}_{\text{Nuc}}$ performance via different configurations on Seq. 3. Our NeurSTT model is the fastest at recovering the background at early epochs while being relatively stable compared with the model without Neur3DTV.
  • ...and 7 more figures