Table of Contents
Fetching ...

PhysDNet: Physics-Guided Decomposition Network of Side-Scan Sonar Imagery

Can Lei, Hayat Rajani, Nuno Gracias, Rafael Garcia, Huigang Wang

TL;DR

PhysDNet addresses the challenge of view-dependent SSS imagery by disentangling the observed intensity into physically meaningful fields: seabed reflectivity $\rho$, terrain elevation $z$, and acoustic path loss $L$. It employs a physics-guided, three-branch encoder–decoder with a Lambertian-based reconstruction $\hat{I}=\rho\cdot\cos\theta\cdot L$ to enable self-supervised training without ground-truth maps. The method introduces a geometry-aware coordinate system, cosine-angle computation, and shadow detection via angle monotonicity, coupled with a three-stage loss curriculum that leverages weak priors and shadow geometry. Experimental results show stable, interpretable decompositions (recovering $\rho$, $z$, and $L$) and superior shadow boundary recovery compared to baselines, demonstrating improved physical consistency and utility for registration and shadow interpretation in SSS analysis.

Abstract

Side-scan sonar (SSS) imagery is widely used for seafloor mapping and underwater remote sensing, yet the measured intensity is strongly influenced by seabed reflectivity, terrain elevation, and acoustic path loss. This entanglement makes the imagery highly view-dependent and reduces the robustness of downstream analysis. In this letter, we present PhysDNet, a physics-guided multi-branch network that decouples SSS images into three interpretable fields: seabed reflectivity, terrain elevation, and propagation loss. By embedding the Lambertian reflection model, PhysDNet reconstructs sonar intensity from these components, enabling self-supervised training without ground-truth annotations. Experiments show that the decomposed representations preserve stable geological structures, capture physically consistent illumination and attenuation, and produce reliable shadow maps. These findings demonstrate that physics-guided decomposition provides a stable and interpretable domain for SSS analysis, improving both physical consistency and downstream tasks such as registration and shadow interpretation.

PhysDNet: Physics-Guided Decomposition Network of Side-Scan Sonar Imagery

TL;DR

PhysDNet addresses the challenge of view-dependent SSS imagery by disentangling the observed intensity into physically meaningful fields: seabed reflectivity , terrain elevation , and acoustic path loss . It employs a physics-guided, three-branch encoder–decoder with a Lambertian-based reconstruction to enable self-supervised training without ground-truth maps. The method introduces a geometry-aware coordinate system, cosine-angle computation, and shadow detection via angle monotonicity, coupled with a three-stage loss curriculum that leverages weak priors and shadow geometry. Experimental results show stable, interpretable decompositions (recovering , , and ) and superior shadow boundary recovery compared to baselines, demonstrating improved physical consistency and utility for registration and shadow interpretation in SSS analysis.

Abstract

Side-scan sonar (SSS) imagery is widely used for seafloor mapping and underwater remote sensing, yet the measured intensity is strongly influenced by seabed reflectivity, terrain elevation, and acoustic path loss. This entanglement makes the imagery highly view-dependent and reduces the robustness of downstream analysis. In this letter, we present PhysDNet, a physics-guided multi-branch network that decouples SSS images into three interpretable fields: seabed reflectivity, terrain elevation, and propagation loss. By embedding the Lambertian reflection model, PhysDNet reconstructs sonar intensity from these components, enabling self-supervised training without ground-truth annotations. Experiments show that the decomposed representations preserve stable geological structures, capture physically consistent illumination and attenuation, and produce reliable shadow maps. These findings demonstrate that physics-guided decomposition provides a stable and interpretable domain for SSS analysis, improving both physical consistency and downstream tasks such as registration and shadow interpretation.

Paper Structure

This paper contains 22 sections, 8 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the proposed PhysDNet framework and the geometric definitions used in the physics-aware model. (a) PhysDNet employs a three-branch architecture to decouple SSS images into reflectivity ($\rho$), terrain elevation ($z$), and path loss ($L$), guided by the Lambertian model. A threshold-based shadow map $S(x,y)$ provides weak supervision, while predicted elevation supports physical computation of $\cos\theta$ and a physics-driven shadow map $\hat{S}(x,y)$. (b) At ping index $i$, the transducer at $O=(0, y_i, 0)$ emits toward seafloor points $P(x_{ij}, y_i, z_{ij})$. The reflection angle $\theta$ is defined between the acoustic incidence vector $\overrightarrow{OP}$ and the local surface normal (from neighbors $P_1$, $P_2$). The propagation angle $\varphi$ measures the grazing angle relative to the vertical axis. Points $P$, $p_2$, and $p_3$ illustrate the angular monotonicity rule for shadow detection, where $p_3$ lies in the acoustic shadow.
  • Figure 2: Detailed architecture of a single branch in PhysDNet. It consists of a DoubleConv for feature extraction, four downsampling stages (DoubleConv + MaxPooling), and four upsampling stages (Upsampling + Skip Connection + DoubleConv). Two AttentionBlocks are inserted at intermediate layers to enhance contextual features. The final OutConv produces the predicted physical map. Feature dimensions and module configurations are annotated in the diagram.
  • Figure 3: Visualization of the outputs from the multi-branch physically decoupled network. Each column corresponds to a test case, with 5 cases in total. From top to bottom: (a) raw side-scan sonar image; (b) predicted acoustic intensity map; (c) predicted seabed reflectivity map; (d) predicted cosine of the reflection angle ($\cos \theta$); (e) predicted acoustic path loss map; (f) predicted seabed elevation map (converted from depth); (g) shadow mask inferred from the predicted elevation.
  • Figure 4: Shadow segmentation comparison. From top to bottom, input SSS, manual GT, ours (shadows derived from predicted $z$), adaptive threshold, OCRNet, BiSeNet, Mask2Former, SegFormer. Our method best matches GT and recovers fine low contrast shadows.