Table of Contents
Fetching ...

Motion-Enhanced Nonlocal Similarity Implicit Neural Representation for Infrared Dim and Small Target Detection

Pei Liu, Yisi Luo, Wenzhen Wang, Xiangyong Cao

TL;DR

The paper tackles infrared dim, small-target detection in dynamic multi-frame scenes by proposing an unsupervised motion-enhanced nonlocal similarity implicit neural representation (optNL-INR). The approach combines motion estimation via optical flow, dynamic multi-frame fusion, nonlocal patch grouping, and a Tucker-decomposed INR background model with SIREN-based factor networks, optimized through 3DTV-regularized ADMM. The authors provide theoretical results on the existence, spatial-temporal smoothness, and ADMM convergence of the nonlocal INR framework, and demonstrate state-of-the-art performance on ATR and Anti-UAV datasets with strong robustness and unsupervised generalization. Overall, the method achieves superior target-background separation and reliable small-target detection in challenging infrared environments, with practical implications for surveillance and search-and-rescue applications.

Abstract

Infrared dim and small target detection presents a significant challenge due to dynamic multi-frame scenarios and weak target signatures in the infrared modality. Traditional low-rank plus sparse models often fail to capture dynamic backgrounds and global spatial-temporal correlations, which results in background leakage or target loss. In this paper, we propose a novel motion-enhanced nonlocal similarity implicit neural representation (INR) framework to address these challenges. We first integrate motion estimation via optical flow to capture subtle target movements, and propose multi-frame fusion to enhance motion saliency. Second, we leverage nonlocal similarity to construct patch tensors with strong low-rank properties, and propose an innovative tensor decomposition-based INR model to represent the nonlocal patch tensor, effectively encoding both the nonlocal low-rankness and spatial-temporal correlations of background through continuous neural representations. An alternating direction method of multipliers is developed for the nonlocal INR model, which enjoys theoretical fixed-point convergence. Experimental results show that our approach robustly separates dim targets from complex infrared backgrounds, outperforming state-of-the-art methods in detection accuracy and robustness.

Motion-Enhanced Nonlocal Similarity Implicit Neural Representation for Infrared Dim and Small Target Detection

TL;DR

The paper tackles infrared dim, small-target detection in dynamic multi-frame scenes by proposing an unsupervised motion-enhanced nonlocal similarity implicit neural representation (optNL-INR). The approach combines motion estimation via optical flow, dynamic multi-frame fusion, nonlocal patch grouping, and a Tucker-decomposed INR background model with SIREN-based factor networks, optimized through 3DTV-regularized ADMM. The authors provide theoretical results on the existence, spatial-temporal smoothness, and ADMM convergence of the nonlocal INR framework, and demonstrate state-of-the-art performance on ATR and Anti-UAV datasets with strong robustness and unsupervised generalization. Overall, the method achieves superior target-background separation and reliable small-target detection in challenging infrared environments, with practical implications for surveillance and search-and-rescue applications.

Abstract

Infrared dim and small target detection presents a significant challenge due to dynamic multi-frame scenarios and weak target signatures in the infrared modality. Traditional low-rank plus sparse models often fail to capture dynamic backgrounds and global spatial-temporal correlations, which results in background leakage or target loss. In this paper, we propose a novel motion-enhanced nonlocal similarity implicit neural representation (INR) framework to address these challenges. We first integrate motion estimation via optical flow to capture subtle target movements, and propose multi-frame fusion to enhance motion saliency. Second, we leverage nonlocal similarity to construct patch tensors with strong low-rank properties, and propose an innovative tensor decomposition-based INR model to represent the nonlocal patch tensor, effectively encoding both the nonlocal low-rankness and spatial-temporal correlations of background through continuous neural representations. An alternating direction method of multipliers is developed for the nonlocal INR model, which enjoys theoretical fixed-point convergence. Experimental results show that our approach robustly separates dim targets from complex infrared backgrounds, outperforming state-of-the-art methods in detection accuracy and robustness.

Paper Structure

This paper contains 15 sections, 4 theorems, 12 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

For a tensor $\mathcal{X} \in \mathbb{R}^{n_1 \times n_2 \times\cdots\times n_N}$ with Tucker rank $\mathrm{rank}_{T}(\mathcal{X}) = (r_1,\cdots,r_N)$, there exist a core tensor $\mathcal{C} \in \mathbb{R}^{r_1 \times\cdots \times r_N}$ and $N$ factor matrices ${\bf U}_1 \in \mathbb{R}^{n_1 \times r

Figures (7)

  • Figure 1: Average detection performance ($IoU$) vs. inference time (s) of several unsupervised detectors for IRSTD.
  • Figure 2: Optical flow diagrams estimated for multiple infrared image scenes. The bottom double-channel color images represent the intensities of motion matrices $({\bf D}_x,{\bf D}_y)$.
  • Figure 3: Overall flowchart of our optNL-INR for IRSTD. a) Motion enhancement integrates the original image $\mathcal{D}$ with the optical flow map $\mathcal{M}$ to generate an enhanced image $\mathcal{X}$. b) Nonlocal grouping employs patch split and grouping to obtain nonlocal similar STT models $\mathcal{X}^p$ and $\mathcal{B}^p$. c) Nonlocal INR represents the background $\mathcal{B}^p$ in a continuous domain to obtain the representation $\mathcal{B}_{\Theta}^{p}$. The separated target patch tensor $\mathcal{T}^p$ is computed via ADMM and reconstructed into the target image $\mathcal{T}$.
  • Figure 4: Convergence curves using relative error between $\mathcal{T}_{t-1}^p$ and $\mathcal{T}_t^p$ in the proposed optNL-INR algorithm.
  • Figure 5: $IoU$ curves w.r.t. hyperparameters in optNL-INR.
  • ...and 2 more figures

Theorems & Definitions (5)

  • Lemma 1: Tensor Tucker decomposition Tucker
  • Theorem 1: Existence of the nonlocal INR model
  • proof
  • Theorem 2
  • Lemma 2