Table of Contents
Fetching ...

Point-to-Mask: From Arbitrary Point Annotations to Mask-Level Infrared Small Target Detection

Weihua Gao, Wenlong Niu, Jie Tang, Man Yang, Jiafeng Zhang, Xiaodong Peng

Abstract

Infrared small target detection (IRSTD) methods predominantly formulate the task as pixel-level segmentation, which requires costly dense annotations and is not well suited to tiny targets with weak texture and ambiguous boundaries. To address this issue, we propose Point-to-Mask, a framework that bridges low-cost point supervision and mask-level detection through two components: a Physics-driven Adaptive Mask Generation (PAMG) module that converts point annotations into compact target masks and geometric cues, and a lightweight Radius-aware Point Regression Network (RPR-Net) that reformulates IRSTD as target center localization and effective radius regression using spatiotemporal motion cues. The two modules form a closed loop: PAMG generates pseudo masks and geometric supervision during training, while the geometric predictions of RPR-Net are fed back to PAMG for pixel-level mask recovery during inference. To facilitate systematic evaluation, we further construct SIRSTD-Pixel, a sequential dataset with refined pixel-level annotations. Experiments show that the proposed framework achieves strong pseudo-label quality, high detection accuracy, and efficient inference, approaching full-supervision performance under point-supervised settings with substantially lower annotation cost. Code and datasets will be available at: https://github.com/GaoScience/point-to-mask.

Point-to-Mask: From Arbitrary Point Annotations to Mask-Level Infrared Small Target Detection

Abstract

Infrared small target detection (IRSTD) methods predominantly formulate the task as pixel-level segmentation, which requires costly dense annotations and is not well suited to tiny targets with weak texture and ambiguous boundaries. To address this issue, we propose Point-to-Mask, a framework that bridges low-cost point supervision and mask-level detection through two components: a Physics-driven Adaptive Mask Generation (PAMG) module that converts point annotations into compact target masks and geometric cues, and a lightweight Radius-aware Point Regression Network (RPR-Net) that reformulates IRSTD as target center localization and effective radius regression using spatiotemporal motion cues. The two modules form a closed loop: PAMG generates pseudo masks and geometric supervision during training, while the geometric predictions of RPR-Net are fed back to PAMG for pixel-level mask recovery during inference. To facilitate systematic evaluation, we further construct SIRSTD-Pixel, a sequential dataset with refined pixel-level annotations. Experiments show that the proposed framework achieves strong pseudo-label quality, high detection accuracy, and efficient inference, approaching full-supervision performance under point-supervised settings with substantially lower annotation cost. Code and datasets will be available at: https://github.com/GaoScience/point-to-mask.
Paper Structure (61 sections, 28 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 61 sections, 28 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: Schematic illustration of the proposed framework. It integrates a Data Stream (top) and a Network Stream (bottom). The PAMG algorithm leverages physics priors to evolve random points into pixel-level masks, generating ground truth for the RPR-Net. The RPR-Net then predicts the target geometry parameters from input sequences using a lightweight point regression architecture. This establishes a robust "Mask-to-Point Supervision" and "Point-to-Mask Detection" cycle.
  • Figure 2: Visual analysis of PAMG robustness. (a) Comparison of masks generated from center ($\star$) and edge ($\times$) seeds, demonstrating that both cover the effective target core. (b) Posterior energy curves showing that despite an initial "climbing" lag for the edge seed, both trajectories converge to distinct but valid energy peaks after the warm-up phase.
  • Figure 3: The graphical user interface of the proposed Label-IRST annotation tool. The interface features a dual-view mechanism designed to reduce visual fatigue and unnecessary cursor movement. The Global View (left) displays the full-resolution raw infrared image for rapid Region of Interest selection. The Detail View (right) automatically renders the selected area in high magnification, allowing users to verify and refine the candidate mask (shown in red). The bottom panel provides controls for annotation parameters and visual enhancement settings.
  • Figure 4: Statistical characteristics of the SIRSTD-Pixel dataset. (a) Scale distribution: target areas are mainly concentrated within a small pixel range, consistent with infrared small target characteristics. (b) Saliency distribution: the SCR values span a broad range, reflecting the diversity of target saliency in the dataset.
  • Figure 5: Qualitative comparison of representative detection results on the SIRSTD-Pixel dataset. The GT panel shows the target location (top-right) and the corresponding ground-truth mask (bottom-right), displayed as a semi-transparent red overlay. Detection results from different methods are also visualized using semi-transparent red masks. Existing methods mainly suffer from three typical failure modes: missed detections under weak target responses, false alarms triggered by complex clutter, and inaccurate shape recovery with fragmented or biased masks. In the shown examples, the proposed method localizes targets more consistently and produces masks that more closely match the annotated thermal response region.
  • ...and 2 more figures