Hierarchical Disentanglement-Alignment Network for Robust SAR Vehicle Recognition
Weijie Li, Wei Yang, Wenpeng Zhang, Tianpeng Liu, Yongxiang Liu, Li Liu
TL;DR
This work tackles robust SAR vehicle recognition under diverse operating conditions and limited data by introducing HDANet, a three-module framework that jointly disentangles target features from clutter and aligns domain-invariant representations. It leverages domain data generation via three augmentations, multitask-assisted mask disentanglement to emphasize target regions, and capsule-based domain alignment with a SimSiam-inspired contrastive loss. Extensive experiments on the MSTAR dataset across SOC and nine EOCs demonstrate state-of-the-art robustness and effective clutter suppression, with ablations confirming each component’s contribution. The findings highlight the potential of combining targeted feature disentanglement with domain-aware alignment to enable reliable SAR ATR in open-world settings, and point to future self-supervised strategies to address data scarcity.
Abstract
Vehicle recognition is a fundamental problem in SAR image interpretation. However, robustly recognizing vehicle targets is a challenging task in SAR due to the large intraclass variations and small interclass variations. Additionally, the lack of large datasets further complicates the task. Inspired by the analysis of target signature variations and deep learning explainability, this paper proposes a novel domain alignment framework named the Hierarchical Disentanglement-Alignment Network (HDANet) to achieve robustness under various operating conditions. Concisely, HDANet integrates feature disentanglement and alignment into a unified framework with three modules: domain data generation, multitask-assisted mask disentanglement, and domain alignment of target features. The first module generates diverse data for alignment, and three simple but effective data augmentation methods are designed to simulate target signature variations. The second module disentangles the target features from background clutter using the multitask-assisted mask to prevent clutter from interfering with subsequent alignment. The third module employs a contrastive loss for domain alignment to extract robust target features from generated diverse data and disentangled features. Lastly, the proposed method demonstrates impressive robustness across nine operating conditions in the MSTAR dataset, and extensive qualitative and quantitative analyses validate the effectiveness of our framework.
