Table of Contents
Fetching ...

6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting

Yufeng Jin, Vignesh Prasad, Snehal Jauhri, Mathias Franzius, Georgia Chalvatzaki

TL;DR

6DOPE-GS presents a model-free pipeline for online 6D object pose estimation and reconstruction from a single RGB-D camera by leveraging 2D Gaussian Splatting. It introduces a Gaussian Object Field optimized jointly with an online keyframe pose graph, aided by dynamic keyframe selection and opacity-based pruning to achieve real-time performance. The approach attains competitive accuracy on HO3D and YCBInEOAT with about a 5× speedup over neural-field baselines and demonstrates live tracking at practical frame rates. This work advances real-time, model-free 6D tracking and on-the-fly object reconstruction, with potential impact on robotics and augmented reality applications that require fast, robust pose estimation without CAD models.

Abstract

Efficient and accurate object pose estimation is an essential component for modern vision systems in many applications such as Augmented Reality, autonomous driving, and robotics. While research in model-based 6D object pose estimation has delivered promising results, model-free methods are hindered by the high computational load in rendering and inferring consistent poses of arbitrary objects in a live RGB-D video stream. To address this issue, we present 6DOPE-GS, a novel method for online 6D object pose estimation \& tracking with a single RGB-D camera by effectively leveraging advances in Gaussian Splatting. Thanks to the fast differentiable rendering capabilities of Gaussian Splatting, 6DOPE-GS can simultaneously optimize for 6D object poses and 3D object reconstruction. To achieve the necessary efficiency and accuracy for live tracking, our method uses incremental 2D Gaussian Splatting with an intelligent dynamic keyframe selection procedure to achieve high spatial object coverage and prevent erroneous pose updates. We also propose an opacity statistic-based pruning mechanism for adaptive Gaussian density control, to ensure training stability and efficiency. We evaluate our method on the HO3D and YCBInEOAT datasets and show that 6DOPE-GS matches the performance of state-of-the-art baselines for model-free simultaneous 6D pose tracking and reconstruction while providing a 5$\times$ speedup. We also demonstrate the method's suitability for live, dynamic object tracking and reconstruction in a real-world setting.

6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting

TL;DR

6DOPE-GS presents a model-free pipeline for online 6D object pose estimation and reconstruction from a single RGB-D camera by leveraging 2D Gaussian Splatting. It introduces a Gaussian Object Field optimized jointly with an online keyframe pose graph, aided by dynamic keyframe selection and opacity-based pruning to achieve real-time performance. The approach attains competitive accuracy on HO3D and YCBInEOAT with about a 5× speedup over neural-field baselines and demonstrates live tracking at practical frame rates. This work advances real-time, model-free 6D tracking and on-the-fly object reconstruction, with potential impact on robotics and augmented reality applications that require fast, robust pose estimation without CAD models.

Abstract

Efficient and accurate object pose estimation is an essential component for modern vision systems in many applications such as Augmented Reality, autonomous driving, and robotics. While research in model-based 6D object pose estimation has delivered promising results, model-free methods are hindered by the high computational load in rendering and inferring consistent poses of arbitrary objects in a live RGB-D video stream. To address this issue, we present 6DOPE-GS, a novel method for online 6D object pose estimation \& tracking with a single RGB-D camera by effectively leveraging advances in Gaussian Splatting. Thanks to the fast differentiable rendering capabilities of Gaussian Splatting, 6DOPE-GS can simultaneously optimize for 6D object poses and 3D object reconstruction. To achieve the necessary efficiency and accuracy for live tracking, our method uses incremental 2D Gaussian Splatting with an intelligent dynamic keyframe selection procedure to achieve high spatial object coverage and prevent erroneous pose updates. We also propose an opacity statistic-based pruning mechanism for adaptive Gaussian density control, to ensure training stability and efficiency. We evaluate our method on the HO3D and YCBInEOAT datasets and show that 6DOPE-GS matches the performance of state-of-the-art baselines for model-free simultaneous 6D pose tracking and reconstruction while providing a 5 speedup. We also demonstrate the method's suitability for live, dynamic object tracking and reconstruction in a real-world setting.

Paper Structure

This paper contains 20 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of our approach: 6DOPE-GS. Given a live input RGB-D video stream, we obtain object segmentation masks using SAM2 raviSAM2Segment on the incoming video frames. We then use LoFTR sunLoFTRDetectorFreeLocal2021, a transformer-based feature matching approach, to obtain pairwise correspondences between multiple views. We initialize a set of "keyframes" based on the density of matched features, for which we establish initial coarse pose estimates using RANSAC. To obtain refined pose updates for the keyframes, we use a 2D Gaussian Splatting-based "Gaussian Object Field" that is jointly optimized with the keyframe poses in a concurrent thread. We filter out erroneous keyframes for accurate pose refinement updates using a novel dynamic keyframe selection mechanism based on spatial coverage and reconstruction confidence. Moreover, we incorporate an opacity percentile-based adaptive density control mechanism to prune out inconsequential Gaussians, thus improving training stability and efficiency. Once the Gaussian Object Field is updated, it is temporarily frozen and the poses of keyframes that were filtered out are also updated. The object pose estimate at each timestep is then obtained by performing an online pose graph optimization using the incoming keyframe with the current set of keyframes.
  • Figure 2: Qualitative results of our method, 6DOPE-GS, tested on video sequences from the HO3D dataset, namely AP13, MPM14, SB13, and SM1 (from top to bottom). Left: Our method tracks the 6D object pose over time with high accuracy, Right: 6DOPE-GS is effective at reconstructing both the appearance (rows 1 and 3) and surface geometry (rows 2 and 4) of the object over time. The first image shows the initial reconstruction at the beginning of the sequence, the second image shows the refined reconstruction over time.
  • Figure 3: Comparison between temporal efficiency and performance for different approaches on the HO3D dataset. While BundleSDF achieves high performance, it comes at the cost of speed. On the other hand, 6DOPE-GS achieves a favorable tradeoff between speed and performance.
  • Figure 4: Example of real-time object tracking. Top row: Live video, object segmentation results, and pose tracking results. Bottom row: Rendered outputs, including color, depth, and surface normals derived from the Gaussian models.