Table of Contents
Fetching ...

Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment

Maneesha Wickramasuriya, Beomyeol Yu, Taeyoung Lee, Murray Snyder

TL;DR

This work addresses the challenge of validating deep monocular $6$D pose estimation for shipboard UAVs without costly ocean trials by combining a Transformer-based pose estimator (TNN-MO) trained on synthetic data with a photorealistic 3D Gaussian Splatting (3DGS) virtual ocean environment. The authors demonstrate accurate pose estimation from monocular images and validate it in real-world and indoor-vessel-inspired simulations, while introducing a vision-in-the-loop framework that couples Vicon ground truth, simulated scenes, and real-time pose estimation for testing vision-based control. Key contributions include the TNN-MO architecture with EPnP and Bayesian fusion, a large-scale synthetic data pipeline, real-world validation on a research vessel, and a real-time Gaussian-splat indoor simulator for end-to-end testing. This framework enables cost-effective, scalable validation of maritime UAV autonomy and supports broader applicability to other vision-driven robotics domains.

Abstract

This paper proposes a vision-in-the-loop simulation environment for deep monocular pose estimation of a UAV operating in an ocean environment. Recently, a deep neural network with a transformer architecture has been successfully trained to estimate the pose of a UAV relative to the flight deck of a research vessel, overcoming several limitations of GPS-based approaches. However, validating the deep pose estimation scheme in an actual ocean environment poses significant challenges due to the limited availability of research vessels and the associated operational costs. To address these issues, we present a photo-realistic 3D virtual environment leveraging recent advancements in Gaussian splatting, a novel technique that represents 3D scenes by modeling image pixels as Gaussian distributions in 3D space, creating a lightweight and high-quality visual model from multiple viewpoints. This approach enables the creation of a virtual environment integrating multiple real-world images collected in situ. The resulting simulation enables the indoor testing of flight maneuvers while verifying all aspects of flight software, hardware, and the deep monocular pose estimation scheme. This approach provides a cost-effective solution for testing and validating the autonomous flight of shipboard UAVs, specifically focusing on vision-based control and estimation algorithms.

Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment

TL;DR

This work addresses the challenge of validating deep monocular D pose estimation for shipboard UAVs without costly ocean trials by combining a Transformer-based pose estimator (TNN-MO) trained on synthetic data with a photorealistic 3D Gaussian Splatting (3DGS) virtual ocean environment. The authors demonstrate accurate pose estimation from monocular images and validate it in real-world and indoor-vessel-inspired simulations, while introducing a vision-in-the-loop framework that couples Vicon ground truth, simulated scenes, and real-time pose estimation for testing vision-based control. Key contributions include the TNN-MO architecture with EPnP and Bayesian fusion, a large-scale synthetic data pipeline, real-world validation on a research vessel, and a real-time Gaussian-splat indoor simulator for end-to-end testing. This framework enables cost-effective, scalable validation of maritime UAV autonomy and supports broader applicability to other vision-driven robotics domains.

Abstract

This paper proposes a vision-in-the-loop simulation environment for deep monocular pose estimation of a UAV operating in an ocean environment. Recently, a deep neural network with a transformer architecture has been successfully trained to estimate the pose of a UAV relative to the flight deck of a research vessel, overcoming several limitations of GPS-based approaches. However, validating the deep pose estimation scheme in an actual ocean environment poses significant challenges due to the limited availability of research vessels and the associated operational costs. To address these issues, we present a photo-realistic 3D virtual environment leveraging recent advancements in Gaussian splatting, a novel technique that represents 3D scenes by modeling image pixels as Gaussian distributions in 3D space, creating a lightweight and high-quality visual model from multiple viewpoints. This approach enables the creation of a virtual environment integrating multiple real-world images collected in situ. The resulting simulation enables the indoor testing of flight maneuvers while verifying all aspects of flight software, hardware, and the deep monocular pose estimation scheme. This approach provides a cost-effective solution for testing and validating the autonomous flight of shipboard UAVs, specifically focusing on vision-based control and estimation algorithms.

Paper Structure

This paper contains 16 sections, 17 figures, 1 table.

Figures (17)

  • Figure 1: TNN-MO model tested on synthetic data: for object classes with confidence scores greater than 0.9, the keypoints and base frame are illustrated in different colors.
  • Figure 2: TNN-MO model tested on real-world data: for object classes with confidence scores greater than 0.9, the keypoints and base frame are highlighted in different colors.
  • Figure 3: Position estimation for overexposed real-world images: the estimated position (orange) is compared against the RTK-GPS measurements (blue).
  • Figure 4: Comparison of real-world images captured by the GoPro Hero 13 Black camera (first row) and photo-realistic images generated from the 3D Gaussian Splatting (3DGS) model (second row).
  • Figure 5: GoPro camera mounted on an octocopter UAV manually flown around a USNA research vessel to collect optical data in Chesapeake Bay, Maryland.
  • ...and 12 more figures