Table of Contents
Fetching ...

Infrared Small Target Detection in Satellite Videos: A New Dataset and A Novel Recurrent Feature Refinement Framework

Xinyi Ying, Li Liu, Zaipin Lin, Yangsi Shi, Yingqian Wang, Ruojing Li, Xu Cao, Boyang Li, Shilin Zhou, Wei An

TL;DR

The paper tackles infrared small target detection in satellite videos (MIRST) by introducing IRSatVideo-LEO, a large-scale semi-simulated dataset, and a recurrent feature refinement (RFR) framework that leverages long-term temporal dependencies for motion compensation and detection. The core methodological advances are the Pyramid Deformable Alignment for coarse-to-fine motion compensation and the Temporal-Spatial-Frequency Modulation to dynamically fuse and enhance target features. Extensive experiments show that ResUNet equipped with RFR achieves state-of-the-art performance on IRSatVideo-LEO, with robust results across varying background complexities and target SCRs, and the approach generalizes well to one-pixel and sub-pixel targets. The work contributes a valuable public dataset and an effective end-to-end framework, with significant implications for satellite surveillance and remote sensing applications.

Abstract

Multi-frame infrared small target (MIRST) detection in satellite videos is a long-standing, fundamental yet challenging task for decades, and the challenges can be summarized as: First, extremely small target size, highly complex clutters & noises, various satellite motions result in limited feature representation, high false alarms, and difficult motion analyses. Second, the lack of large-scale public available MIRST dataset in satellite videos greatly hinders the algorithm development. To address the aforementioned challenges, in this paper, we first build a large-scale dataset for MIRST detection in satellite videos (namely IRSatVideo-LEO), and then develop a recurrent feature refinement (RFR) framework as the baseline method. Specifically, IRSatVideo-LEO is a semi-simulated dataset with synthesized satellite motion, target appearance, trajectory and intensity, which can provide a standard toolbox for satellite video generation and a reliable evaluation platform to facilitate the algorithm development. For baseline method, RFR is proposed to be equipped with existing powerful CNN-based methods for long-term temporal dependency exploitation and integrated motion compensation & MIRST detection. Specifically, a pyramid deformable alignment (PDA) module and a temporal-spatial-frequency modulation (TSFM) module are proposed to achieve effective and efficient feature alignment, propagation, aggregation and refinement. Extensive experiments have been conducted to demonstrate the effectiveness and superiority of our scheme. The comparative results show that ResUNet equipped with RFR outperforms the state-of-the-art MIRST detection methods. Dataset and code are released at https://github.com/XinyiYing/RFR.

Infrared Small Target Detection in Satellite Videos: A New Dataset and A Novel Recurrent Feature Refinement Framework

TL;DR

The paper tackles infrared small target detection in satellite videos (MIRST) by introducing IRSatVideo-LEO, a large-scale semi-simulated dataset, and a recurrent feature refinement (RFR) framework that leverages long-term temporal dependencies for motion compensation and detection. The core methodological advances are the Pyramid Deformable Alignment for coarse-to-fine motion compensation and the Temporal-Spatial-Frequency Modulation to dynamically fuse and enhance target features. Extensive experiments show that ResUNet equipped with RFR achieves state-of-the-art performance on IRSatVideo-LEO, with robust results across varying background complexities and target SCRs, and the approach generalizes well to one-pixel and sub-pixel targets. The work contributes a valuable public dataset and an effective end-to-end framework, with significant implications for satellite surveillance and remote sensing applications.

Abstract

Multi-frame infrared small target (MIRST) detection in satellite videos is a long-standing, fundamental yet challenging task for decades, and the challenges can be summarized as: First, extremely small target size, highly complex clutters & noises, various satellite motions result in limited feature representation, high false alarms, and difficult motion analyses. Second, the lack of large-scale public available MIRST dataset in satellite videos greatly hinders the algorithm development. To address the aforementioned challenges, in this paper, we first build a large-scale dataset for MIRST detection in satellite videos (namely IRSatVideo-LEO), and then develop a recurrent feature refinement (RFR) framework as the baseline method. Specifically, IRSatVideo-LEO is a semi-simulated dataset with synthesized satellite motion, target appearance, trajectory and intensity, which can provide a standard toolbox for satellite video generation and a reliable evaluation platform to facilitate the algorithm development. For baseline method, RFR is proposed to be equipped with existing powerful CNN-based methods for long-term temporal dependency exploitation and integrated motion compensation & MIRST detection. Specifically, a pyramid deformable alignment (PDA) module and a temporal-spatial-frequency modulation (TSFM) module are proposed to achieve effective and efficient feature alignment, propagation, aggregation and refinement. Extensive experiments have been conducted to demonstrate the effectiveness and superiority of our scheme. The comparative results show that ResUNet equipped with RFR outperforms the state-of-the-art MIRST detection methods. Dataset and code are released at https://github.com/XinyiYing/RFR.
Paper Structure (34 sections, 17 equations, 14 figures, 8 tables)

This paper contains 34 sections, 17 equations, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Implementation details of the IRSatVideo-LEO dataset that consists of four steps: (a) Background sequence simulation. 3D attitude sequence is first used to generate a global background sequence from a SWIR satellite image by homography transformation, and 2D location is then used to crop the local background sequence. (b) Target appearance sequence simulation. Several key target appearances (e.g., red and blue Gaussian kernel images) are used to generate target appearance sequence by quadratic interpolation\ref{['web']} (e.g., light red and light blue Gaussian kernel images are interpolated results). (c) Target trajectory simulation. We first generate the trajectory in the $1^{st}$ frame reference by connecting and smoothing multiple separate trajectories, and then employ satellite motion to perform reference transformation to generate trajectory in the current frame reference. (d) Target background superposition. Target appearance sequence and intensity are multiplied, which is then adaptive weighted summed by background sequence using target trajectory to generate the simulated sequence.
  • Figure 2: Illustrations of sequence attributes. (a) shows cloud cover & location distribution with respect to (w.r.t) sequence number in training and test dataset. (b) shows instance number & sequence length with w.r.t sequence number in training and test dataset. Numbers represent the corresponding sequence number.
  • Figure 3: Illustrations of infrared small target and local background neighborhood. (a) Example images of infrared small targets. (b1) Local background neighborhood is extended from the BBox of the target ($h_0$ in height and $w_0$ width) by $d$ in both height and width. (b2) illustrates the target region (i.e., yellow area) and local background region (i.e., purple area).
  • Figure 4: Example target trajectories in satellite videos. (a), (b), (c) shows the trajectory of target without (w/o) and with (w.) one and two swerving. Note that, discrete points of target trajectory with a sampling rate of 20 are shown for better visualization. The density of discrete points represents the velocity of the target, and dense points represent high velocity. Red arrows are specific to the swerving position.
  • Figure 5: Illustrations of moving background and swerving target. For moving background, (a1) and (b1) show image sets of the local background sequence in the $1^{st}$ frame reference, and the yellow arrows represent the moving directions of background. For swerving target, (a1) and (b1) show the target trajectories in the $1^{st}$ frame reference, and the end of trajectories are labeled by circular arrows. (a2) and (b2) show the original end-to-end adjacent trajectories (i.e., green lines) and the smooth adjacent trajectories (i.e., blue lines). Red arrows are specific to the swerving position.
  • ...and 9 more figures