AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion

Mohamad Qadri; Kevin Zhang; Akshay Hinduja; Michael Kaess; Adithya Pediredla; Christopher A. Metzler

AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion

Mohamad Qadri, Kevin Zhang, Akshay Hinduja, Michael Kaess, Adithya Pediredla, Christopher A. Metzler

TL;DR

This work tackles underwater 3D surface reconstruction under restricted baselines by fusing optical RGB imagery with imaging sonar through a neural rendering framework. It introduces Acoustic-Optical NeuS (AONeuS), which extends implicit surface representations by using a shared Signed Distance Function along with modality-specific renderers for camera and sonar measurements, optimized via differentiable rendering with a multi-term loss and a two-stage weight schedule. Evaluations on synthetic and real datasets show AONeuS consistently outperforms RGB-only (NeuS) and sonar-only (NeuSIS) baselines, particularly at small baselines, and analyses reveal improved forward-model conditioning with multimodal data. The approach advances practical underwater perception by enabling high-fidelity 3D surface reconstructions when motion and baselines are severely constrained, and it provides public data and code to facilitate further research.

Abstract

Underwater perception and 3D surface reconstruction are challenging problems with broad applications in construction, security, marine archaeology, and environmental monitoring. Treacherous operating conditions, fragile surroundings, and limited navigation control often dictate that submersibles restrict their range of motion and, thus, the baseline over which they can capture measurements. In the context of 3D scene reconstruction, it is well-known that smaller baselines make reconstruction more challenging. Our work develops a physics-based multimodal acoustic-optical neural surface reconstruction framework (AONeuS) capable of effectively integrating high-resolution RGB measurements with low-resolution depth-resolved imaging sonar measurements. By fusing these complementary modalities, our framework can reconstruct accurate high-resolution 3D surfaces from measurements captured over heavily-restricted baselines. Through extensive simulations and in-lab experiments, we demonstrate that AONeuS dramatically outperforms recent RGB-only and sonar-only inverse-differentiable-rendering--based surface reconstruction methods. A website visualizing the results of our paper is located at this address: https://aoneus.github.io/

AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion

TL;DR

Abstract

Paper Structure (30 sections, 14 equations, 12 figures, 11 tables)

This paper contains 30 sections, 14 equations, 12 figures, 11 tables.

Introduction
Related Work
Camera Imaging
Sonar Imaging
Neural Rendering
Multimodal Imaging
Background
Imaging Sonars
Image Formation Model of an Imaging Sonar
Image Formation Model of an Optical Camera
Problem Statement
Method
Acoustic-Optical NeuS
Loss Function
Weight Scheduling
...and 15 more sections

Figures (12)

Figure 1: Acoustic-Optical Measurement Processes. (a) RGB measurement process and example measurement. Pixels along a common ray passing through the camera center map to the same image pixel on the image plane. (b) Sonar measurement process and example measurement. In a sonar image, the azimuth $\theta$ and range $r$ of the imaged object are resolved. However, the elevation information $\phi$ is lost; all objects located along the elevation arc (in blue) map to the same pixel.
Figure 2: Acoustic-Optical Measurement Ambiguities. (a) Two RGB measurements captured over a limited baseline struggle to localize a point along the depth-axis. (b) Two sonar measurements captured over a limited baseline struggle to localize a point along the x-axis. Because they have orthogonal ambiguities, RGB and sonar measurements are highly complementary.
Figure 3: AONeuS Reconstruction Framework. A shared surface geometry SDF network $\mathbf{N}$ is used in combination with rendering specific neural rendering modules. For each sampled point $\mathbf{x}$ along an acoustic or optical ray, $\mathbf{N}$ outputs its signed distance, its gradient as well as 2 features vectors $\mathbf{F}^{\text{son}}$ and $\mathbf{F}^{\text{cam}}$ all serving as input to their respective rendering networks. $\textbf{D}^{\text{son}}$ and $\textbf{D}^{\text{cam}}$ are respectively the directions of the acoustic and optical rays.
Figure 4: Simulation setup. We visualize the orientations of the camera and the sonar relative to the scene, here a turtle, in our simulation. The optical axis of the camera, the $z$ axis of its own local coordinate frame, is aligned with the $Z$ axis of the world coordinate frame. The elevation axis of the sonar, the $z$ axis of its own local coordinate frame, is aligned with the $-X$ axis of the world coordinate frame. Additionally, we visualize the image planes of the camera and sonar, the 2D planes onto which they project the 3D scene points.
Figure 5: Experimental hardware setup. (a) Test water tank used to conduct the experiments and its dimensions. (b) Test object. (c) Bluefin Hovering Autonomous Underwater Vehicle (HAUV) and its mounted hardware (Didson imaging sonar and Doppler Velocity Log (DVL). (d) FLIR Blackfly S GigE camera used for image capture and its watertight enclosure.
...and 7 more figures

AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion

TL;DR

Abstract

AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion

Authors

TL;DR

Abstract

Table of Contents

Figures (12)