Improved Image-based Pose Regressor Models for Underwater Environments
Luyuan Peng, Hari Vishnu, Mandar Chitre, Yuen Min Too, Bharath Kalyan, Rajat Mishra
TL;DR
This work tackles underwater visual relocalization by regressing a $6$-DOF pose from monocular RGB imagery. It introduces an LSTM-augmented PoseNet-style regression framework and compares it against baseline CNN architectures, using a composite loss $\mathcal{L} = \mathcal{L}_{p} + \beta \mathcal{L}_{q}$ with $\beta=30$, where $\mathcal{L}_{p}=\|\mathbf{p}-\hat{\mathbf{p}}\|_2$ and $\mathcal{L}_{q}=\|\mathbf{q}-\hat{\mathbf{q}}\|_2$ for quaternion orientation $\mathbf{q}$. Validation is performed on simulator and controlled tank datasets, including stereo augmentation by incorporating right-camera data. Results show high pose accuracy in both simulated and water-tank environments, with data augmentation and the LSTM-based regression offering robustness to underwater visual distortions and turbidity. These findings support the method’s potential for reliable underwater navigation and inspection using cost-effective monocular cameras.
Abstract
We investigate the performance of image-based pose regressor models in underwater environments for relocalization. Leveraging PoseNet and PoseLSTM, we regress a 6-degree-of-freedom pose from single RGB images with high accuracy. Additionally, we explore data augmentation with stereo camera images to improve model accuracy. Experimental results demonstrate that the models achieve high accuracy in both simulated and clear waters, promising effective real-world underwater navigation and inspection applications.
