Table of Contents
Fetching ...

You Only Look Omni Gradient Backpropagation for Moving Infrared Small Target Detection

Guoyi Zhang, Guangsheng Xu, Siyang Chen, Han Wang, Xiaohu Zhang

TL;DR

Moving infrared small target detection suffers from severe target-background imbalance and weak per-frame representations. The authors propose $BP\text{-}FPN$, a backpropagation-driven feature pyramid that strengthens intra-frame representations via Gradient-Isolated Low-Level Shortcut ($GILS$) and Directional Gradient Regularization ($DGR$). The method delivers state-of-the-art results on multiple public IR datasets with negligible overhead and is plug-and-play with existing baselines. The authors provide theoretical justification for gradient isolation and cross-level gradient coupling, and extensive experiments demonstrate substantial gains, especially on hard scenarios. Overall, this work offers a principled, practical route to robust small-object detection in infrared video.

Abstract

Moving infrared small target detection is a key component of infrared search and tracking systems, yet it remains extremely challenging due to low signal-to-clutter ratios, severe target-background imbalance, and weak discriminative features. Existing deep learning methods primarily focus on spatio-temporal feature aggregation, but their gains are limited, revealing that the fundamental bottleneck lies in ambiguous per-frame feature representations rather than spatio-temporal modeling itself. Motivated by this insight, we propose BP-FPN, a backpropagation-driven feature pyramid architecture that fundamentally rethinks feature learning for small target. BP-FPN introduces Gradient-Isolated Low-Level Shortcut (GILS) to efficiently incorporate fine-grained target details without inducing shortcut learning, and Directional Gradient Regularization (DGR) to enforce hierarchical feature consistency during backpropagation. The design is theoretically grounded, introduces negligible computational overhead, and can be seamlessly integrated into existing frameworks. Extensive experiments on multiple public datasets show that BP-FPN consistently establishes new state-of-the-art performance. To the best of our knowledge, it is the first FPN designed for this task entirely from the backpropagation perspective.

You Only Look Omni Gradient Backpropagation for Moving Infrared Small Target Detection

TL;DR

Moving infrared small target detection suffers from severe target-background imbalance and weak per-frame representations. The authors propose , a backpropagation-driven feature pyramid that strengthens intra-frame representations via Gradient-Isolated Low-Level Shortcut () and Directional Gradient Regularization (). The method delivers state-of-the-art results on multiple public IR datasets with negligible overhead and is plug-and-play with existing baselines. The authors provide theoretical justification for gradient isolation and cross-level gradient coupling, and extensive experiments demonstrate substantial gains, especially on hard scenarios. Overall, this work offers a principled, practical route to robust small-object detection in infrared video.

Abstract

Moving infrared small target detection is a key component of infrared search and tracking systems, yet it remains extremely challenging due to low signal-to-clutter ratios, severe target-background imbalance, and weak discriminative features. Existing deep learning methods primarily focus on spatio-temporal feature aggregation, but their gains are limited, revealing that the fundamental bottleneck lies in ambiguous per-frame feature representations rather than spatio-temporal modeling itself. Motivated by this insight, we propose BP-FPN, a backpropagation-driven feature pyramid architecture that fundamentally rethinks feature learning for small target. BP-FPN introduces Gradient-Isolated Low-Level Shortcut (GILS) to efficiently incorporate fine-grained target details without inducing shortcut learning, and Directional Gradient Regularization (DGR) to enforce hierarchical feature consistency during backpropagation. The design is theoretically grounded, introduces negligible computational overhead, and can be seamlessly integrated into existing frameworks. Extensive experiments on multiple public datasets show that BP-FPN consistently establishes new state-of-the-art performance. To the best of our knowledge, it is the first FPN designed for this task entirely from the backpropagation perspective.

Paper Structure

This paper contains 24 sections, 17 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Comparison between our method and existing methods. Unlike existing methods focusing on spatiotemporal feature refinement and fusion, our method is motivated by the insight that the key to video object detection lies in modeling feature-to-feature correlations along the temporal dimension, and thus emphasizes learning stronger per-frame representations. Importantly, from a backpropagation perspective, the proposed method introduces near-zero additional computational overhead and can be seamlessly integrated with existing frameworks, consistently leading to measurable performance gains.
  • Figure 2: Comparison of the Macro Architectures of Different FPNs. Here, $C_i$ denotes backbone features, $P_i$ represents FPN features, and $P_i^{'}$ is the final output feature. The quantitative results are obtained on the IRDST dataset. (a) The vanilla FPN is commonly employed in current moving infrared small target detection models. It features low computational cost and memory footprint, making it suitable for inter-frame feature aggregation in videos. However, it suffers from information loss for small objects, which not only degrades detection performance but also slows down convergence. (b) The High-Resolution FPN employs shortcut connections that directly transmit low-level semantic cues, which compromises its generalization to unseen environments and significantly increases computational burden while lowering inference speed. (c) The Hourglass FPN simultaneously fuses features of different resolutions to strengthen multi-scale representation and small-object localization, but it remains affected by the optimization difficulties of FPNs. (d) The proposed BP-FPN is designed from the perspective of backpropagation, achieving significant performance improvement with negligible additional complexity.
  • Figure 3: Parameter–$\text{mAP}_{50}$, FLOPs–$\text{mAP}_{50}$ and FPS–$\text{mAP}_{50}$ plots on the IRDST dataset. By integrating our BP-FPN, we achieve a significant performance improvement with near-zero additional computational overhead, establishing a new state-of-the-art (SOTA).
  • Figure 4: Comparison between the proposed Gradient-Isolated Low-Level Shortcut and the vanilla low-level shortcut. Through Gradient-Isolate, we ensure the consistency of the primary optimization direction. Moreover, since only the parameters of the downsampling module in the $C_2$ branch receive gradient updates at this stage, the downsampling operation is able to refine small-object details from high-resolution feature maps.
  • Figure 5: A comparison between small objects in natural scenes zhu2021detection and infrared small targets RDIAN shows that infrared targets exhibit extreme sparsity. According to compressive sensing theory baraniuk2010model, under such conditions, suitable compression can preserve small-target information while simultaneously reducing the dimensionality of feature maps, obviating the need to adhere strictly to the Nyquist sampling principle.
  • ...and 6 more figures