Learning Which Side to Scan: Multi-View Informed Active Perception with Side Scan Sonar for Autonomous Underwater Vehicles

Advaith V. Sethuraman; Philip Baldoni; Katherine A. Skinner; James McMahon

Learning Which Side to Scan: Multi-View Informed Active Perception with Side Scan Sonar for Autonomous Underwater Vehicles

Advaith V. Sethuraman, Philip Baldoni, Katherine A. Skinner, James McMahon

TL;DR

The paper tackles efficient underwater object classification with side-scan sonar by formulating Adaptive Surveying and Reacquisition (ASR) as a graph-based active perception problem. It introduces an Angular View Graph (AVG) to encode multiple viewing angles and relations, a Graph Multi-View ATR (GMVATR) for joint view-based recognition, and a Deep Q-Network policy to select the next best view, all evaluated in a photorealistic sonar simulator. Key contributions include the AVG formulation, a novel reward structure that minimizes the number of views while ensuring correct classification, and extensive experiments showing improved accuracy, coverage rate, and classification efficiency over state-of-the-art baselines. The approach promises more efficient autonomous missions in underwater exploration, archaeology, and environmental monitoring, with future work focusing on sim2real transfer and expanding the action space for more flexible reacquisition strategies.

Abstract

Autonomous underwater vehicles often perform surveys that capture multiple views of targets in order to provide more information for human operators or automatic target recognition algorithms. In this work, we address the problem of choosing the most informative views that minimize survey time while maximizing classifier accuracy. We introduce a novel active perception framework for multi-view adaptive surveying and reacquisition using side scan sonar imagery. Our framework addresses this challenge by using a graph formulation for the adaptive survey task. We then use Graph Neural Networks (GNNs) to both classify acquired sonar views and to choose the next best view based on the collected data. We evaluate our method using simulated surveys in a high-fidelity side scan sonar simulator. Our results demonstrate that our approach is able to surpass the state-of-the-art in classification accuracy and survey efficiency. This framework is a promising approach for more efficient autonomous missions involving side scan sonar, such as underwater exploration, marine archaeology, and environmental monitoring.

Learning Which Side to Scan: Multi-View Informed Active Perception with Side Scan Sonar for Autonomous Underwater Vehicles

TL;DR

Abstract

Paper Structure (30 sections, 2 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 30 sections, 2 equations, 4 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Graph Neural Networks for Robotics
Multi-View Classification
Next Best View Planning
Technical Approach
Problem Formulation
Surveying
Reacquisition
Adaptive Survey and Reaquisition
Multi-View ATR as an Angular View-Graph
Angular View Graph
Multi-View ATR
View-Q Function
Reinforcement Learning Details
...and 15 more sections

Figures (4)

Figure 1: a) An Iver3 autonomous underwater vehicle equipped with a Klein 3500 side scan sonar system. b) A real cylindrical target imaged in side scan sonar is shown in the red boxes. Side scan sonar image appearance is highly dependent on viewing angle and target geometry, both of which determine how much acoustic energy returns to the receiver.
Figure 2: a) The traditional Surveying and Reacquisition process. First a comprehensive survey of the region is performed (lawnmower pattern shown), then the discovered contacts (shown as red dots) are reacquired and inspected. b) The proposed strategy in this paper is to combine the Surveying and Reacquisition stages and inform the Reacquisition planner with the most informative views. By avoiding uninformative reacquisition survey legs, we can reduce the total time for clearing a search region.
Figure 3: Our proposed framework. For a side scan view captured at an angle of $\theta$ degrees, we produce feature embedding $f(\theta)$ using a CNN. Then we form the angular view-graph which consists of the feature embeddings and angular constraint edges $\phi$. The graph is used for both Multi-View Automatic Target Recognition tasks and as the state of a reinforcement learning (DQN) agent. The DQN agent informs the planner of the next best views to capture. The process is repeated and the graph is incrementally expanded until the DQN agent stops reacquisition. The target classification decision is computed using the final angular view-graph.
Figure 4: Photorealistic side scan sonar images produced by our sonar simulator. Each target's orientation, size, and terrain is randomized, then imaged using a simulated OID pattern with $K=6$, pass length $L=75m$, radius $R=12m$, height from bottom (HFB) of 3m and resolution of 0.02m/pixel. Note how the sonar images change as a function of viewing angle. Side scan sonar images are single channel images of acoustic intensities, but a gold color palette is applied for better visibility.

Learning Which Side to Scan: Multi-View Informed Active Perception with Side Scan Sonar for Autonomous Underwater Vehicles

TL;DR

Abstract

Learning Which Side to Scan: Multi-View Informed Active Perception with Side Scan Sonar for Autonomous Underwater Vehicles

Authors

TL;DR

Abstract

Table of Contents

Figures (4)