SARATR-X: Toward Building A Foundation Model for SAR Target Recognition
Weijie Li, Wei Yang, Yuenan Hou, Li Liu, Yongxiang Liu, Xiang Li
TL;DR
This work addresses the lack of foundation models for SAR ATR by introducing SARATR-X, a self-supervised foundation model trained on a large unlabeled SAR corpus to enable scalable, label-efficient adaptation across SAR target recognition tasks.A diverse pre-training dataset (SARDet-180K) aggregates 186,600 SAR target samples from 14 open-source datasets, spanning multiple targets, scenes, and sensors, to support broad SAR generalization.SARATR-X uses a SAR-tailored HiViT backbone and a two-step pre-training pipeline (SSL-ImageNet initialization followed by SAR-focused masked image modeling with multi-scale gradient features) to mitigate speckle noise and preserve small-target information.Evaluations demonstrate strong performance on few-shot classification, robustness across operating conditions, and multi-dataset detection, often rivaling or surpassing existing supervised, semi-supervised, or self-supervised methods, with results and code released publicly to spur further SAR foundation-model research.
Abstract
Despite the remarkable progress in synthetic aperture radar automatic target recognition (SAR ATR), recent efforts have concentrated on detecting and classifying a specific category, e.g., vehicles, ships, airplanes, or buildings. One of the fundamental limitations of the top-performing SAR ATR methods is that the learning paradigm is supervised, task-specific, limited-category, closed-world learning, which depends on massive amounts of accurately annotated samples that are expensively labeled by expert SAR analysts and have limited generalization capability and scalability. In this work, we make the first attempt towards building a foundation model for SAR ATR, termed SARATR-X. SARATR-X learns generalizable representations via self-supervised learning (SSL) and provides a cornerstone for label-efficient model adaptation to generic SAR target detection and classification tasks. Specifically, SARATR-X is trained on 0.18 M unlabelled SAR target samples, which are curated by combining contemporary benchmarks and constitute the largest publicly available dataset till now. Considering the characteristics of SAR images, a backbone tailored for SAR ATR is carefully designed, and a two-step SSL method endowed with multi-scale gradient features was applied to ensure the feature diversity and model scalability of SARATR-X. The capabilities of SARATR-X are evaluated on classification under few-shot and robustness settings and detection across various categories and scenes, and impressive performance is achieved, often competitive with or even superior to prior fully supervised, semi-supervised, or self-supervised algorithms. Our SARATR-X and the curated dataset are released at https://github.com/waterdisappear/SARATR-X to foster research into foundation models for SAR image interpretation.
