6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting
Yufeng Jin, Vignesh Prasad, Snehal Jauhri, Mathias Franzius, Georgia Chalvatzaki
TL;DR
6DOPE-GS presents a model-free pipeline for online 6D object pose estimation and reconstruction from a single RGB-D camera by leveraging 2D Gaussian Splatting. It introduces a Gaussian Object Field optimized jointly with an online keyframe pose graph, aided by dynamic keyframe selection and opacity-based pruning to achieve real-time performance. The approach attains competitive accuracy on HO3D and YCBInEOAT with about a 5× speedup over neural-field baselines and demonstrates live tracking at practical frame rates. This work advances real-time, model-free 6D tracking and on-the-fly object reconstruction, with potential impact on robotics and augmented reality applications that require fast, robust pose estimation without CAD models.
Abstract
Efficient and accurate object pose estimation is an essential component for modern vision systems in many applications such as Augmented Reality, autonomous driving, and robotics. While research in model-based 6D object pose estimation has delivered promising results, model-free methods are hindered by the high computational load in rendering and inferring consistent poses of arbitrary objects in a live RGB-D video stream. To address this issue, we present 6DOPE-GS, a novel method for online 6D object pose estimation \& tracking with a single RGB-D camera by effectively leveraging advances in Gaussian Splatting. Thanks to the fast differentiable rendering capabilities of Gaussian Splatting, 6DOPE-GS can simultaneously optimize for 6D object poses and 3D object reconstruction. To achieve the necessary efficiency and accuracy for live tracking, our method uses incremental 2D Gaussian Splatting with an intelligent dynamic keyframe selection procedure to achieve high spatial object coverage and prevent erroneous pose updates. We also propose an opacity statistic-based pruning mechanism for adaptive Gaussian density control, to ensure training stability and efficiency. We evaluate our method on the HO3D and YCBInEOAT datasets and show that 6DOPE-GS matches the performance of state-of-the-art baselines for model-free simultaneous 6D pose tracking and reconstruction while providing a 5$\times$ speedup. We also demonstrate the method's suitability for live, dynamic object tracking and reconstruction in a real-world setting.
