Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment
Maneesha Wickramasuriya, Beomyeol Yu, Taeyoung Lee, Murray Snyder
TL;DR
This work addresses the challenge of validating deep monocular $6$D pose estimation for shipboard UAVs without costly ocean trials by combining a Transformer-based pose estimator (TNN-MO) trained on synthetic data with a photorealistic 3D Gaussian Splatting (3DGS) virtual ocean environment. The authors demonstrate accurate pose estimation from monocular images and validate it in real-world and indoor-vessel-inspired simulations, while introducing a vision-in-the-loop framework that couples Vicon ground truth, simulated scenes, and real-time pose estimation for testing vision-based control. Key contributions include the TNN-MO architecture with EPnP and Bayesian fusion, a large-scale synthetic data pipeline, real-world validation on a research vessel, and a real-time Gaussian-splat indoor simulator for end-to-end testing. This framework enables cost-effective, scalable validation of maritime UAV autonomy and supports broader applicability to other vision-driven robotics domains.
Abstract
This paper proposes a vision-in-the-loop simulation environment for deep monocular pose estimation of a UAV operating in an ocean environment. Recently, a deep neural network with a transformer architecture has been successfully trained to estimate the pose of a UAV relative to the flight deck of a research vessel, overcoming several limitations of GPS-based approaches. However, validating the deep pose estimation scheme in an actual ocean environment poses significant challenges due to the limited availability of research vessels and the associated operational costs. To address these issues, we present a photo-realistic 3D virtual environment leveraging recent advancements in Gaussian splatting, a novel technique that represents 3D scenes by modeling image pixels as Gaussian distributions in 3D space, creating a lightweight and high-quality visual model from multiple viewpoints. This approach enables the creation of a virtual environment integrating multiple real-world images collected in situ. The resulting simulation enables the indoor testing of flight maneuvers while verifying all aspects of flight software, hardware, and the deep monocular pose estimation scheme. This approach provides a cost-effective solution for testing and validating the autonomous flight of shipboard UAVs, specifically focusing on vision-based control and estimation algorithms.
