SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection

Yahao Lu; Yupei Lin; Han Wu; Xiaoyu Xian; Yukai Shi; Liang Lin

SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection

Yahao Lu, Yupei Lin, Han Wu, Xiaoyu Xian, Yukai Shi, Liang Lin

TL;DR

A negative augmentation approach is proposed to generate massive negatives for self-supervised learning to enrich diversity as well as maintain semantic invariance in single-frame infrared small target (SIRST) detection.

Abstract

Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds. Recently, convolutional neural networks have achieved significant advantages in general object detection. With the development of Transformer, the scale of SIRST models is constantly increasing. Due to the limited training samples, performance has not been improved accordingly. The quality, quantity, and diversity of the infrared dataset are critical to the detection of small targets. To highlight this issue, we propose a negative sample augmentation method in this paper. Specifically, a negative augmentation approach is proposed to generate massive negatives for self-supervised learning. Firstly, we perform a sequential noise modeling technology to generate realistic infrared data. Secondly, we fuse the extracted noise with the original data to facilitate diversity and fidelity in the generated data. Lastly, we proposed a negative augmentation strategy to enrich diversity as well as maintain semantic invariance. The proposed algorithm produces a synthetic SIRST-5K dataset, which contains massive pseudo-data and corresponding labels. With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed. Compared with other state-of-the-art (SOTA) methods, our method achieves outstanding performance in terms of probability of detection (Pd), false-alarm rate (Fa), and intersection over union (IoU).

SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection

TL;DR

Abstract

Paper Structure (16 sections, 9 equations, 10 figures, 5 tables, 2 algorithms)

This paper contains 16 sections, 9 equations, 10 figures, 5 tables, 2 algorithms.

Introduction
RELATED WORK
Single-frame Infrared Small Target Detection
Data Augmentation for Representation Learning
Methodology
Real-world Noise2Noise Displacement
Negative Augmentation
Attention Feature Fusion (AFF)
Self-supervised Learning with Massive Negatives
EXPERIMENT
Synthetic SIRST-5K Dataset
Evaluation Metrics
Implementation Details
Comparation
Ablation Studies
...and 1 more sections

Figures (10)

Figure 1: Self-supervised strategy based on negative augmentation. To explore target correspondence in SIRST, we propose a negative augmentation approach to generate massive negatives for self-supervised representation learning.
Figure 2: The quality, quantity, and diversity of the infrared data have a significant impact on the detection of small targets. By applying a self-supervised learning paradigm with massive pseudo-data, our negative generation strategy has achieved faster convergence rate, less training loss and better mean $IoU$.
Figure 3: The qualitative results of different SIRST detection methods. The correctly detected targets, false alarms, and weak detection regions are highlighted using red, yellow, and green dashed circles respectively. Our method performs accurate target localization with a lower false alarm rate.
Figure 4: Illustration of the proposed massive negatives synthesis framework. (a) Noise Sampling and Modeling module. The input image is first divided into equally sized local regions. Qualified noise sampling regions are selected and resized to fetch real-world noise. Then, the training samples are mixed with real-world noise; (b) Negative Augmentation module. Among the generated samples, the challenging small targets are further processed with negative augmentation to produce massive negatives; (c) Self-supervised Learning. By utilizing these negative samples and their corresponding labels, we can implement self-supervised representation learning to learn richer feature representations.
Figure 5: A demonstration of noise sampling in infrared small target detection dataset. High-variance noise can affect the model's ability to recognize small targets. The yellow dashed circle highlights the introduced texture. By selecting Noise-prone Region ($A_{noise}$), our framework can fetch diverse noise in infrared sensors to facilitate realistic sample augmentation in SIRST-5K.
...and 5 more figures

SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection

TL;DR

Abstract

SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (10)