Table of Contents
Fetching ...

Hierarchical Disentanglement-Alignment Network for Robust SAR Vehicle Recognition

Weijie Li, Wei Yang, Wenpeng Zhang, Tianpeng Liu, Yongxiang Liu, Li Liu

TL;DR

This work tackles robust SAR vehicle recognition under diverse operating conditions and limited data by introducing HDANet, a three-module framework that jointly disentangles target features from clutter and aligns domain-invariant representations. It leverages domain data generation via three augmentations, multitask-assisted mask disentanglement to emphasize target regions, and capsule-based domain alignment with a SimSiam-inspired contrastive loss. Extensive experiments on the MSTAR dataset across SOC and nine EOCs demonstrate state-of-the-art robustness and effective clutter suppression, with ablations confirming each component’s contribution. The findings highlight the potential of combining targeted feature disentanglement with domain-aware alignment to enable reliable SAR ATR in open-world settings, and point to future self-supervised strategies to address data scarcity.

Abstract

Vehicle recognition is a fundamental problem in SAR image interpretation. However, robustly recognizing vehicle targets is a challenging task in SAR due to the large intraclass variations and small interclass variations. Additionally, the lack of large datasets further complicates the task. Inspired by the analysis of target signature variations and deep learning explainability, this paper proposes a novel domain alignment framework named the Hierarchical Disentanglement-Alignment Network (HDANet) to achieve robustness under various operating conditions. Concisely, HDANet integrates feature disentanglement and alignment into a unified framework with three modules: domain data generation, multitask-assisted mask disentanglement, and domain alignment of target features. The first module generates diverse data for alignment, and three simple but effective data augmentation methods are designed to simulate target signature variations. The second module disentangles the target features from background clutter using the multitask-assisted mask to prevent clutter from interfering with subsequent alignment. The third module employs a contrastive loss for domain alignment to extract robust target features from generated diverse data and disentangled features. Lastly, the proposed method demonstrates impressive robustness across nine operating conditions in the MSTAR dataset, and extensive qualitative and quantitative analyses validate the effectiveness of our framework.

Hierarchical Disentanglement-Alignment Network for Robust SAR Vehicle Recognition

TL;DR

This work tackles robust SAR vehicle recognition under diverse operating conditions and limited data by introducing HDANet, a three-module framework that jointly disentangles target features from clutter and aligns domain-invariant representations. It leverages domain data generation via three augmentations, multitask-assisted mask disentanglement to emphasize target regions, and capsule-based domain alignment with a SimSiam-inspired contrastive loss. Extensive experiments on the MSTAR dataset across SOC and nine EOCs demonstrate state-of-the-art robustness and effective clutter suppression, with ablations confirming each component’s contribution. The findings highlight the potential of combining targeted feature disentanglement with domain-aware alignment to enable reliable SAR ATR in open-world settings, and point to future self-supervised strategies to address data scarcity.

Abstract

Vehicle recognition is a fundamental problem in SAR image interpretation. However, robustly recognizing vehicle targets is a challenging task in SAR due to the large intraclass variations and small interclass variations. Additionally, the lack of large datasets further complicates the task. Inspired by the analysis of target signature variations and deep learning explainability, this paper proposes a novel domain alignment framework named the Hierarchical Disentanglement-Alignment Network (HDANet) to achieve robustness under various operating conditions. Concisely, HDANet integrates feature disentanglement and alignment into a unified framework with three modules: domain data generation, multitask-assisted mask disentanglement, and domain alignment of target features. The first module generates diverse data for alignment, and three simple but effective data augmentation methods are designed to simulate target signature variations. The second module disentangles the target features from background clutter using the multitask-assisted mask to prevent clutter from interfering with subsequent alignment. The third module employs a contrastive loss for domain alignment to extract robust target features from generated diverse data and disentangled features. Lastly, the proposed method demonstrates impressive robustness across nine operating conditions in the MSTAR dataset, and extensive qualitative and quantitative analyses validate the effectiveness of our framework.
Paper Structure (42 sections, 10 equations, 12 figures, 5 tables)

This paper contains 42 sections, 10 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: The main challenges of robust SAR vehicle recognition. Sub-figure (a) provides a taxonomy of these challenges brought by intraclass variations, interclass variations, and data collection. The right sub-figure illustrates typical variations with SAR images in the MSTAR dataset. On the right, (b1), (b2), and (b3) show large intraclass variations due to the sensitivity to operating conditions. (b1) contains target signatures and shadow variations in the red dashed line with different depression angles. (b2) displays the variation of target partial structures in the red dashed line with azimuth angles, such as the T-72 tank gun barrel is most visible in the vertical line-of-sight direction. (b3) illustrates that the intensity variation of different background clutter affects the adjacent target signatures. Therefore, SAR images of the same category have a large intraclass variation across operating conditions. In the end, sub-figure (b4) showcases small interclass variations between fine-grained vehicle target categories. Visual differences in SAR target signatures are much smaller than in natural images. The SAR images appear similar but are three different target categories.
  • Figure 2: Attention module results of CBAM. The Spatial Attention Mechanism (SAM) in CBAM generates masks based on the pooling results of feature maps. SAM mask shows that the background clutter has more weight than the target, which indicates that data bias can affect mask learning.
  • Figure 3: The overall framework of HDANet. Sub-figure (a) provides its framework with three modules: domain data generation, multitask-assisted mask disentanglement, and domain alignment of target features. Three data augmentation methods are used in domain data generation. Mask disentanglement includes an encoder and decoder. The feature maps and the target mask are multiplied to get target features, and the disentangled target features are used for classification and domain alignment. We use the segmentation task and the $l_1$ loss to assist with the target mask. In the domain alignment module, target feature maps are converted to capsule vectors, and then the contrastive loss with cosine similarity, which SimSiam calculates, is used to enhance feature robustness. The parameters and structure are the same on both sides. The right sub-figure illustrates the details of (b) ConvBlocks (BN is batch normalization). and (c) Symbols list.
  • Figure 4: Feature maps of different padding methods (feature maps on the left are MVGGNetref15 with zero padding, and on the right are feature maps of HDANet (ours) with mirror padding; input from the top is an MSTAR image, and from the bottom is an all-zero image). Models with zero padding produce target-independent edge and center artifacts in the left feature mapsref48.
  • Figure 5: Radar charts of experimental results (see Table \ref{['table_result']} for detailed numbers). Our method performs more robustly than others under various operating conditions.
  • ...and 7 more figures