Table of Contents
Fetching ...

Improved Image-based Pose Regressor Models for Underwater Environments

Luyuan Peng, Hari Vishnu, Mandar Chitre, Yuen Min Too, Bharath Kalyan, Rajat Mishra

TL;DR

This work tackles underwater visual relocalization by regressing a $6$-DOF pose from monocular RGB imagery. It introduces an LSTM-augmented PoseNet-style regression framework and compares it against baseline CNN architectures, using a composite loss $\mathcal{L} = \mathcal{L}_{p} + \beta \mathcal{L}_{q}$ with $\beta=30$, where $\mathcal{L}_{p}=\|\mathbf{p}-\hat{\mathbf{p}}\|_2$ and $\mathcal{L}_{q}=\|\mathbf{q}-\hat{\mathbf{q}}\|_2$ for quaternion orientation $\mathbf{q}$. Validation is performed on simulator and controlled tank datasets, including stereo augmentation by incorporating right-camera data. Results show high pose accuracy in both simulated and water-tank environments, with data augmentation and the LSTM-based regression offering robustness to underwater visual distortions and turbidity. These findings support the method’s potential for reliable underwater navigation and inspection using cost-effective monocular cameras.

Abstract

We investigate the performance of image-based pose regressor models in underwater environments for relocalization. Leveraging PoseNet and PoseLSTM, we regress a 6-degree-of-freedom pose from single RGB images with high accuracy. Additionally, we explore data augmentation with stereo camera images to improve model accuracy. Experimental results demonstrate that the models achieve high accuracy in both simulated and clear waters, promising effective real-world underwater navigation and inspection applications.

Improved Image-based Pose Regressor Models for Underwater Environments

TL;DR

This work tackles underwater visual relocalization by regressing a -DOF pose from monocular RGB imagery. It introduces an LSTM-augmented PoseNet-style regression framework and compares it against baseline CNN architectures, using a composite loss with , where and for quaternion orientation . Validation is performed on simulator and controlled tank datasets, including stereo augmentation by incorporating right-camera data. Results show high pose accuracy in both simulated and water-tank environments, with data augmentation and the LSTM-based regression offering robustness to underwater visual distortions and turbidity. These findings support the method’s potential for reliable underwater navigation and inspection using cost-effective monocular cameras.

Abstract

We investigate the performance of image-based pose regressor models in underwater environments for relocalization. Leveraging PoseNet and PoseLSTM, we regress a 6-degree-of-freedom pose from single RGB images with high accuracy. Additionally, we explore data augmentation with stereo camera images to improve model accuracy. Experimental results demonstrate that the models achieve high accuracy in both simulated and clear waters, promising effective real-world underwater navigation and inspection applications.
Paper Structure (4 sections, 1 equation, 4 figures, 1 table)

This paper contains 4 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Neural network architecture overview.
  • Figure 2: Example images from our underwater tank datasets.
  • Figure 3: Overview of simulated scene in underwater simulator (top) and the simulated image captured by the ROV (bottom).
  • Figure 4: Predicted Trajectories vs Real Trajectories. Predicted trajectories with tank datasets 1 and 2 (bottom row) are very close to the actual trajectories (top row).