Table of Contents
Fetching ...

SDF-Net: Structure-Aware Disentangled Feature Learning for Opticall-SAR Ship Re-identification

Furui Chen, Han Wang, Yuhan Sun, Jianing You, Yixuan Lv, Zhuang Zhou, Hong Tan, Shengyang Li

Abstract

Cross-modal ship re-identification (ReID) between optical and synthetic aperture radar (SAR) imagery is fundamentally challenged by the severe radiometric discrepancy between passive optical imaging and coherent active radar sensing. While existing approaches primarily rely on statistical distribution alignment or semantic matching, they often overlook a critical physical prior: ships are rigid objects whose geometric structures remain stable across sensing modalities, whereas texture appearance is highly modality-dependent. In this work, we propose SDF-Net, a Structure-Aware Disentangled Feature Learning Network that systematically incorporates geometric consistency into optical--SAR ship ReID. Built upon a ViT backbone, SDF-Net introduces a structure consistency constraint that extracts scale-invariant gradient energy statistics from intermediate layers to robustly anchor representations against radiometric variations. At the terminal stage, SDF-Net disentangles the learned representations into modality-invariant identity features and modality-specific characteristics. These decoupled cues are then integrated through a parameter-free additive residual fusion, effectively enhancing discriminative power. Extensive experiments on the HOSS-ReID dataset demonstrate that SDF-Net consistently outperforms existing state-of-the-art methods. The code and trained models are publicly available at https://github.com/cfrfree/SDF-Net.

SDF-Net: Structure-Aware Disentangled Feature Learning for Opticall-SAR Ship Re-identification

Abstract

Cross-modal ship re-identification (ReID) between optical and synthetic aperture radar (SAR) imagery is fundamentally challenged by the severe radiometric discrepancy between passive optical imaging and coherent active radar sensing. While existing approaches primarily rely on statistical distribution alignment or semantic matching, they often overlook a critical physical prior: ships are rigid objects whose geometric structures remain stable across sensing modalities, whereas texture appearance is highly modality-dependent. In this work, we propose SDF-Net, a Structure-Aware Disentangled Feature Learning Network that systematically incorporates geometric consistency into optical--SAR ship ReID. Built upon a ViT backbone, SDF-Net introduces a structure consistency constraint that extracts scale-invariant gradient energy statistics from intermediate layers to robustly anchor representations against radiometric variations. At the terminal stage, SDF-Net disentangles the learned representations into modality-invariant identity features and modality-specific characteristics. These decoupled cues are then integrated through a parameter-free additive residual fusion, effectively enhancing discriminative power. Extensive experiments on the HOSS-ReID dataset demonstrate that SDF-Net consistently outperforms existing state-of-the-art methods. The code and trained models are publicly available at https://github.com/cfrfree/SDF-Net.
Paper Structure (32 sections, 10 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 32 sections, 10 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of the optical--SAR ship re-identification task. The framework aims to match the same ship identity across vastly different optical and SAR sensor modalities.
  • Figure 2: Architectural pipeline of the proposed SDF-Net. The framework is logically structured into four sequential stages: (a) Input Stage: Optical and SAR images undergo a cross-modal tokenization strategy to neutralize low-level sensor discrepancies. (b) Intermediate Stage: Geometric stability is enforced via the Structure-Aware Consistency Learning (SCL) module, which extracts and aligns intermediate gradient energy to anchor representations on modality-invariant structural primitives. (c) Terminal Stage: The final representations are refined by the Disentangled Feature Learning (DFL) module, explicitly decoupling shared identity embeddings from sensor-specific variations before integrating them through an additive residual fusion. (d) Inference Stage: The frozen network leverages the robust fused representations to execute accurate bidirectional cross-modal ship retrieval.
  • Figure 3: Hyper-parameter sensitivity analysis of SDF-Net. The heatmaps illustrate the variation of mAP (left) and Rank-1 (right) accuracies (in %) under different combinations of the orthogonality constraint weight $\lambda_{\text{orth}}$ and the structure consistency weight $\lambda_{\text{struct}}$.
  • Figure 4: Grad-CAM visualization of the spatial attention maps generated by SDF-Net. From left to right within each group: the input optical image, the corresponding optical attention map, the SAR attention map, and the input SAR image. The network consistently focuses on the modality-invariant ship hull structure, effectively suppressing optical sea clutter and penetrating SAR speckle noise.
  • Figure 5: Visual evolution of feature heatmaps across different Transformer layers. For each modality group, the columns from left to right represent the original input image followed by the feature activation maps extracted from layers 2, 4, 6, 8, 10, and 12. Intermediate representations at layer 6 successfully isolate the modality-invariant geometric structure, whereas shallow layers are corrupted by sensor noise and deep layers suffer from spatial semantic collapse.
  • ...and 1 more figures