Table of Contents
Fetching ...

SONIC: Sonar Image Correspondence using Pose Supervised Learning for Imaging Sonars

Samiran Gode, Akshay Hinduja, Michael Kaess

TL;DR

SONIC tackles data association for underwater SLAM by learning sonar-specific image correspondences through pose supervision. It introduces a sonar epipolar contour-based loss and a cyclic consistency loss within a coarse-to-fine, differentiable matching network, trained primarily on simulated data. Across simulation and tank experiments, SONIC outperforms AKAZE and LightGlue in inlier rate and downstream pose accuracy, demonstrating stronger resilience to viewpoint changes in polar sonar images. This work advances loop-closure reliability and place recognition in sonar-based SLAM, with future directions including open-water validation and extending to multiple sonar modes and cross-platform matching.

Abstract

In this paper, we address the challenging problem of data association for underwater SLAM through a novel method for sonar image correspondence using learned features. We introduce SONIC (SONar Image Correspondence), a pose-supervised network designed to yield robust feature correspondence capable of withstanding viewpoint variations. The inherent complexity of the underwater environment stems from the dynamic and frequently limited visibility conditions, restricting vision to a few meters of often featureless expanses. This makes camera-based systems suboptimal in most open water application scenarios. Consequently, multibeam imaging sonars emerge as the preferred choice for perception sensors. However, they too are not without their limitations. While imaging sonars offer superior long-range visibility compared to cameras, their measurements can appear different from varying viewpoints. This inherent variability presents formidable challenges in data association, particularly for feature-based methods. Our method demonstrates significantly better performance in generating correspondences for sonar images which will pave the way for more accurate loop closure constraints and sonar-based place recognition. Code as well as simulated and real-world datasets will be made public to facilitate further development in the field.

SONIC: Sonar Image Correspondence using Pose Supervised Learning for Imaging Sonars

TL;DR

SONIC tackles data association for underwater SLAM by learning sonar-specific image correspondences through pose supervision. It introduces a sonar epipolar contour-based loss and a cyclic consistency loss within a coarse-to-fine, differentiable matching network, trained primarily on simulated data. Across simulation and tank experiments, SONIC outperforms AKAZE and LightGlue in inlier rate and downstream pose accuracy, demonstrating stronger resilience to viewpoint changes in polar sonar images. This work advances loop-closure reliability and place recognition in sonar-based SLAM, with future directions including open-water validation and extending to multiple sonar modes and cross-platform matching.

Abstract

In this paper, we address the challenging problem of data association for underwater SLAM through a novel method for sonar image correspondence using learned features. We introduce SONIC (SONar Image Correspondence), a pose-supervised network designed to yield robust feature correspondence capable of withstanding viewpoint variations. The inherent complexity of the underwater environment stems from the dynamic and frequently limited visibility conditions, restricting vision to a few meters of often featureless expanses. This makes camera-based systems suboptimal in most open water application scenarios. Consequently, multibeam imaging sonars emerge as the preferred choice for perception sensors. However, they too are not without their limitations. While imaging sonars offer superior long-range visibility compared to cameras, their measurements can appear different from varying viewpoints. This inherent variability presents formidable challenges in data association, particularly for feature-based methods. Our method demonstrates significantly better performance in generating correspondences for sonar images which will pave the way for more accurate loop closure constraints and sonar-based place recognition. Code as well as simulated and real-world datasets will be made public to facilitate further development in the field.
Paper Structure (14 sections, 13 equations, 6 figures, 2 tables)

This paper contains 14 sections, 13 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Real-world matching performance: Sonar images taken from different planar positions in a test tank show our method providing significantly better matches than AKAZE with the brute force matcher, and the SuperPoint keypoints matched using LightGlue. Given the keypoints in the first frame, SONIC uses expectation matching to determine the correspondences and presents only those correspondences with high confidence.
  • Figure 2: The basic imaging sonar sensor model of a point feature. Each pixel provides direct measurements of the bearing / azimuth ($\theta$) and range ($r$), but the elevation angle ($\phi$) is lost in the projection onto the image plane - analogous to the loss of the range in the perspective projection of a camera. The imaged volume, called the frustum, is defined by the sensors limits in azimuth $\left[\theta_{min},\theta_{max}\right]$, range $\left[r_{min},r_{max}\right]$, and elevation $\left[\phi_{min},\phi_{max}\right]$.
  • Figure 3: Sonar Epipolar Geometry: The elevation arc of a point in the first image is transformed into the frame of the second image and then projected, which creates an epipolar contour.
  • Figure 4: Network architecture highlights: a) For each qeury point $x_{1}$, its corresponding location $\hat{x}_{2}$(Eq. \ref{['eq:expected_corerespondence']}) is represented as the expectation of a distribution computed from the correlation between the feature descriptors. The associated uncertainty also helps in reweighting training loss. During training, keypoints serve as queries (b) Searching correspondence across the entire image is costly. The location of the correspondence $p^{c}$ at the coarse level is used to ascertain a local window at the fine level, $p^{f}$ is found in this window using differentiable matching.
  • Figure 5: Loss functions: The yellow point $x_{1}$ represents a queried keypoint in the first image. The red cross $\hat{x_{2}}$is the predicted point. The yellow dotted line represents the sampled points on the epipolar contour of point $x_{1}$. $L_{epipolar }$ is the shortest distance to the epipolar contour, or simply the epipolar loss. $L_{cyclic}$ is the cyclic loss to assert that the mapping of the feature point is close to its original position.
  • ...and 1 more figures