GS-Pose: Generalizable Segmentation-based 6D Object Pose Estimation with 3D Gaussian Splatting
Dingding Cai, Janne Heikkilä, Esa Rahtu
TL;DR
GS-Pose tackles generalizable $6D$ object pose estimation from RGB images for novel objects by building a three-representation reference database offline and applying a cascaded inference online: detector, initial pose via rotation-aware template retrieval, and a differentiable render-and-compare GS-Refiner. The core contributions are (i) a semantic representation, rotation-aware embeddings, and a 3D Gaussian Object representation; (ii) a segmentation-based detection and rotation-aware matching pipeline; and (iii) a fast, differentiable 3D Gaussian splatting renderer enabling iterative pose refinement. The approach achieves state-of-the-art results on LINEMOD and OnePose-LowTexture, showcasing strong performance on textureless and symmetric objects while using commodity hardware for data capture. This work advances RGB-only, model-free pose estimation by integrating multiple specialized representations and a differentiable 3D rendering-based refinement, with practical implications for robotics and AR where rapid acquisition of new objects is feasible.
Abstract
This paper introduces GS-Pose, a unified framework for localizing and estimating the 6D pose of novel objects. GS-Pose begins with a set of posed RGB images of a previously unseen object and builds three distinct representations stored in a database. At inference, GS-Pose operates sequentially by locating the object in the input image, estimating its initial 6D pose using a retrieval approach, and refining the pose with a render-and-compare method. The key insight is the application of the appropriate object representation at each stage of the process. In particular, for the refinement step, we leverage 3D Gaussian splatting, a novel differentiable rendering technique that offers high rendering speed and relatively low optimization time. Off-the-shelf toolchains and commodity hardware, such as mobile phones, can be used to capture new objects to be added to the database. Extensive evaluations on the LINEMOD and OnePose-LowTexture datasets demonstrate excellent performance, establishing the new state-of-the-art. Project page: https://dingdingcai.github.io/gs-pose.
