Table of Contents
Fetching ...

SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection

Fenghao Song, Shaojing Yang, Xi Zhou

Abstract

Ship detection in Synthetic Aperture Radar (SAR) imagery is fundamentally challenged by inherent coherent speckle noise, complex coastal clutter, and the prevalence of small-scale targets. Conventional detectors, primarily designed for optical imagery, often exhibit limited robustness against SAR-specific degradation and suffer from the loss of fine-grained ship signatures during spatial downsampling. To address these limitations, we propose SARES-DEIM, a domain-aware detection framework grounded in the DEtection TRansformer (DETR) paradigm. Central to our approach is SARESMoE (SAR-aware Expert Selection Mixture-of-Experts), a module leveraging a sparse gating mechanism to selectively route features toward specialized frequency and wavelet experts. This sparsely-activated architecture effectively filters speckle noise and semantic clutter while maintaining high computational efficiency. Furthermore, we introduce the Space-to-Depth Enhancement Pyramid (SDEP) neck to preserve high-resolution spatial cues from shallow stages, significantly improving the localization of small targets. Extensive experiments on two benchmark datasets demonstrate the superiority of SARES-DEIM. Notably, on the challenging HRSID dataset, our model achieves a mAP50:95 of 76.4% and a mAP50 of 93.8%, outperforming state-of-the-art YOLO-series and specialized SAR detectors.

SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection

Abstract

Ship detection in Synthetic Aperture Radar (SAR) imagery is fundamentally challenged by inherent coherent speckle noise, complex coastal clutter, and the prevalence of small-scale targets. Conventional detectors, primarily designed for optical imagery, often exhibit limited robustness against SAR-specific degradation and suffer from the loss of fine-grained ship signatures during spatial downsampling. To address these limitations, we propose SARES-DEIM, a domain-aware detection framework grounded in the DEtection TRansformer (DETR) paradigm. Central to our approach is SARESMoE (SAR-aware Expert Selection Mixture-of-Experts), a module leveraging a sparse gating mechanism to selectively route features toward specialized frequency and wavelet experts. This sparsely-activated architecture effectively filters speckle noise and semantic clutter while maintaining high computational efficiency. Furthermore, we introduce the Space-to-Depth Enhancement Pyramid (SDEP) neck to preserve high-resolution spatial cues from shallow stages, significantly improving the localization of small targets. Extensive experiments on two benchmark datasets demonstrate the superiority of SARES-DEIM. Notably, on the challenging HRSID dataset, our model achieves a mAP50:95 of 76.4% and a mAP50 of 93.8%, outperforming state-of-the-art YOLO-series and specialized SAR detectors.

Paper Structure

This paper contains 22 sections, 4 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: SARES-DEIM overview. The architecture focuses on domain-specific feature enhancement and high-resolution spatial cue preservation.
  • Figure 2: Qualitative detection comparisons on the HRSID dataset. The rows from top to bottom represent the Ground Truth (GT), DEIM Baseline (Base), and SARES-DEIM (Ours), respectively, across six representative maritime samples. Green boxes denote GT annotations, while blue boxes denote predicted bounding boxes. Yellow and red ellipses highlight instances of missed targets and false detections, respectively. Best viewed zoomed in and in color.
  • Figure 3: Expert-level CAM visualizations on HRSID (Pure MoE validation without SDEP). (a) Homogeneous (SharedExpert Only); (b) P3: Spatial Expert Only; (c) P3: Wavelet Expert Only; (d) P4: Frequency Expert Only; (e) P4: Hybrid Expert Only; (f) P5: Frequency Expert Only; (g) P5: Hybrid Expert Only; (h) Uniform Gating (non-sparse equal weighting); (i) Full SARESMoE (Proposed). Warmer colors indicate stronger feature activations.
  • Figure 4: Module-level ablation visualizations on HRSID, corresponding to key configurations in Table \ref{['tab:ablation']}. Each column displays the detection bounding boxes (top) and corresponding CAM heatmaps (bottom) for: (a) Baseline; (b) Baseline + SARESMoE; (c) Baseline + SDEP; and (d) full SARES-DEIM. Warmer colors in heatmaps represent stronger feature responses.