Table of Contents
Fetching ...

Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models

Jerry Yan, Chinmay Talegaonkar, Nicholas Antipa, Eric Terrill, Sophia Merrifield

TL;DR

This work tackles the challenge of estimating the pose and burial fraction of buried seabed barrels from ROV imagery in the San Pedro Basin. It proposes a learning-based pipeline that combines underwater 3D reconstruction (DUSt3R) with segmentation (Grounding DINO + SAM) and BarrelNet, a modified PointNet, to predict the barrel axis $\vec{\mathbf{n}}$, radius $r$, and centroid $\boldsymbol{\mathbf{c}}$, with burial fraction $b_f$ computed via Monte Carlo sampling. BarrelNet is trained exclusively on synthetically generated, occluded cylinder point clouds to model burial effects and is shown to dramatically outperform classical cylinder fitting in synthetic tests, with qualitative transfer to real ROV data. The framework enables robust quantification of buried debris impact on marine environments and provides a basis for extending pose estimation to other underwater objects in future work.

Abstract

We present an approach for pose and burial fraction estimation of debris field barrels found on the seabed in the Southern California San Pedro Basin. Our computational workflow leverages recent advances in foundation models for segmentation and a vision transformer-based approach to estimate the point cloud which defines the geometry of the barrel. We propose BarrelNet for estimating the 6-DOF pose and radius of buried barrels from the barrel point clouds as input. We train BarrelNet using synthetically generated barrel point clouds, and qualitatively demonstrate the potential of our approach using remotely operated vehicle (ROV) video footage of barrels found at a historic dump site. We compare our method to a traditional least squares fitting approach and show significant improvement according to our defined benchmarks.

Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models

TL;DR

This work tackles the challenge of estimating the pose and burial fraction of buried seabed barrels from ROV imagery in the San Pedro Basin. It proposes a learning-based pipeline that combines underwater 3D reconstruction (DUSt3R) with segmentation (Grounding DINO + SAM) and BarrelNet, a modified PointNet, to predict the barrel axis , radius , and centroid , with burial fraction computed via Monte Carlo sampling. BarrelNet is trained exclusively on synthetically generated, occluded cylinder point clouds to model burial effects and is shown to dramatically outperform classical cylinder fitting in synthetic tests, with qualitative transfer to real ROV data. The framework enables robust quantification of buried debris impact on marine environments and provides a basis for extending pose estimation to other underwater objects in future work.

Abstract

We present an approach for pose and burial fraction estimation of debris field barrels found on the seabed in the Southern California San Pedro Basin. Our computational workflow leverages recent advances in foundation models for segmentation and a vision transformer-based approach to estimate the point cloud which defines the geometry of the barrel. We propose BarrelNet for estimating the 6-DOF pose and radius of buried barrels from the barrel point clouds as input. We train BarrelNet using synthetically generated barrel point clouds, and qualitatively demonstrate the potential of our approach using remotely operated vehicle (ROV) video footage of barrels found at a historic dump site. We compare our method to a traditional least squares fitting approach and show significant improvement according to our defined benchmarks.
Paper Structure (13 sections, 1 equation, 3 figures)

This paper contains 13 sections, 1 equation, 3 figures.

Figures (3)

  • Figure 1: The overall processing workflow for barrel pose estimation. Given a series of underwater images, we reconstruct the scene as a 3D point cloud using DUSt3R wang_dust3r_2024 and mask the barrel using Grounding DINO + Segment Anything liu_grounding_2023kirillov_segment_2023. We then use this information to isolate the barrel in the point cloud, which we then feed to BarrelNet (a modification to the PointNet qi_pointnet_2017 architecture) to find the cylinder axis and its dimensions.
  • Figure 2: A qualitative comparison from roughly the same viewpoint between 3D reconstructions from COLMAP + OpenMVS (top) and DUSt3R (bottom). COLMAP + OpenMVS is highly susceptible to imaging conditions, leading to noisy and incomplete point clouds, while DUSt3R creates smooth (sometimes too smooth) and complete point clouds.
  • Figure 3: Results on 4 different seabed barrel scenes (top row) obtained at a water depth of 850m in the San Pedro Basin, CA. As a qualitative metric, we overlay the fitted cylinders (orange) on the reconstructed DUSt3R point clouds. The barrel axis vectors are shown as red rays. Classical least squares method (middle row) produces inconsistent results due to MLESAC randomness, except for (d). For our method (bottom row), despite slight mismatches (c,d) the barrel axis vectors are largely oriented correctly w.r.t. the ocean floor.