Table of Contents
Fetching ...

SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping

Lingpeng Chen, Jiakun Tang, Apple Pui-Yi Chui, Ziyang Hong, Junfeng Wu

TL;DR

The paper tackles robust 3D underwater reconstruction under challenging turbidity by fusing imaging sonar and camera data. It introduces SonarSweep, an end-to-end framework that adapts deep plane sweep to cross-modal fusion, back-projecting sonar features onto sonar-aligned planes and warping them into the camera view to build a multi-modal cost volume. Through extensive sim-to-real experiments, it demonstrates state-of-the-art dense depth accuracy and robustness across distance and turbidity, and releases a synchronized stereo-camera and imaging sonar dataset along with code. This approach holds practical significance for reliable autonomous underwater perception and mapping, especially in visually degraded environments.

Abstract

Accurate 3D reconstruction in visually-degraded underwater environments remains a formidable challenge. Single-modality approaches are insufficient: vision-based methods fail due to poor visibility and geometric constraints, while sonar is crippled by inherent elevation ambiguity and low resolution. Consequently, prior fusion technique relies on heuristics and flawed geometric assumptions, leading to significant artifacts and an inability to model complex scenes. In this paper, we introduce SonarSweep, a novel, end-to-end deep learning framework that overcomes these limitations by adapting the principled plane sweep algorithm for cross-modal fusion between sonar and visual data. Extensive experiments in both high-fidelity simulation and real-world environments demonstrate that SonarSweep consistently generates dense and accurate depth maps, significantly outperforming state-of-the-art methods across challenging conditions, particularly in high turbidity. To foster further research, we will publicly release our code and a novel dataset featuring synchronized stereo-camera and sonar data, the first of its kind.

SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping

TL;DR

The paper tackles robust 3D underwater reconstruction under challenging turbidity by fusing imaging sonar and camera data. It introduces SonarSweep, an end-to-end framework that adapts deep plane sweep to cross-modal fusion, back-projecting sonar features onto sonar-aligned planes and warping them into the camera view to build a multi-modal cost volume. Through extensive sim-to-real experiments, it demonstrates state-of-the-art dense depth accuracy and robustness across distance and turbidity, and releases a synchronized stereo-camera and imaging sonar dataset along with code. This approach holds practical significance for reliable autonomous underwater perception and mapping, especially in visually degraded environments.

Abstract

Accurate 3D reconstruction in visually-degraded underwater environments remains a formidable challenge. Single-modality approaches are insufficient: vision-based methods fail due to poor visibility and geometric constraints, while sonar is crippled by inherent elevation ambiguity and low resolution. Consequently, prior fusion technique relies on heuristics and flawed geometric assumptions, leading to significant artifacts and an inability to model complex scenes. In this paper, we introduce SonarSweep, a novel, end-to-end deep learning framework that overcomes these limitations by adapting the principled plane sweep algorithm for cross-modal fusion between sonar and visual data. Extensive experiments in both high-fidelity simulation and real-world environments demonstrate that SonarSweep consistently generates dense and accurate depth maps, significantly outperforming state-of-the-art methods across challenging conditions, particularly in high turbidity. To foster further research, we will publicly release our code and a novel dataset featuring synchronized stereo-camera and sonar data, the first of its kind.

Paper Structure

This paper contains 33 sections, 13 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: The SonarSweep System. (Left) The experimental AUV in a challenging underwater environment. (Top Right) The integrated camera and sonar sensor suite. (Bottom Right) Conceptual diagram of the fusion approach.
  • Figure 2: The Forward-Looking Sonar (FLS) sensor model. A 3D point $\bm{P_s}$ is measured by its range $d$ and bearing $\theta$. The elevation angle $\phi$ is collapsed during the projection, leading to ambiguity along a circular arc.
  • Figure 3: An overview of the SonarSweep pipeline. From a synchronized sonar and camera image pair, we extract feature maps using parallel encoders. The core of our method involves hypothesizing $N$ candidate planes, onto which 2D sonar features are back-projected and differentiably warped into the camera's view. These warped feature maps are concatenated with the camera feature map to construct a multi-modal cost volume, which is regularized and regressed to produce a dense depth map. The colored lines in the central feature maps highlight the fundamental matching principle, where a structure finds its strongest correspondence only at the correct depth plane ($d_i$); the feature maps themselves are illustrative visualizations of the high-dimensional vectors learned by the encoders.
  • Figure 4: (a) Geometric parameterization of a candidate plane, defined by an inclination angle $\alpha$ and a distance $d_i$. (b) Illustration of our projective-consistent sampling. $I^m$ and $I^n$ are 3D points back-projected from the 2D sonar measurement $I$. This principle is applied consistently for all measurements (e.g., $J$, $K$). To create uniform steps in the camera's pixel space, the hypothesized planes must be sampled in a geometric progression ($d_{i+1} = k \cdot d_i$).
  • Figure 5: From left to right: the simulated underwater world in OceanSim with varied water conditions; the physical lab pool setup; a real-world sensor suite; and the corresponding high-fidelity output from our digital twin.
  • ...and 4 more figures