End-to-End Probabilistic Geometry-Guided Regression for 6DoF Object Pose Estimation
Thomas Pöllabauer, Jiayin Li, Volker Knauthe, Sarah Berkei, Arjan Kuijper
TL;DR
6DoF pose estimation is ill-posed under occlusion and symmetry, challenging single-pose predictions. The authors introduce EPRO-GDR, a probabilistic extension of GDRNPP that outputs a pose distribution via EPro-P$n$P and a two-phase training regime, enabling sampling of multiple pose candidates with an uncertainty measure, including a $D_{KL}$ loss to align distributions. They achieve improved AR_BOP scores on LM-O, YCB-V, and ITODD compared to the baseline and demonstrate that distribution learning benefits scene-level optimization. The work advances end-to-end probabilistic geometry-guided regression for robust pose estimation in XR and robotics contexts.
Abstract
6D object pose estimation is the problem of identifying the position and orientation of an object relative to a chosen coordinate system, which is a core technology for modern XR applications. State-of-the-art 6D object pose estimators directly predict an object pose given an object observation. Due to the ill-posed nature of the pose estimation problem, where multiple different poses can correspond to a single observation, generating additional plausible estimates per observation can be valuable. To address this, we reformulate the state-of-the-art algorithm GDRNPP and introduce EPRO-GDR (End-to-End Probabilistic Geometry-Guided Regression). Instead of predicting a single pose per detection, we estimate a probability density distribution of the pose. Using the evaluation procedure defined by the BOP (Benchmark for 6D Object Pose Estimation) Challenge, we test our approach on four of its core datasets and demonstrate superior quantitative results for EPRO-GDR on LM-O, YCB-V, and ITODD. Our probabilistic solution shows that predicting a pose distribution instead of a single pose can improve state-of-the-art single-view pose estimation while providing the additional benefit of being able to sample multiple meaningful pose candidates.
