Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)
Tsu-Ching Hsiao, Hao-Wei Chen, Hsuan-Kung Yang, Chun-Yi Lee
TL;DR
This work introduces the first image-domain diffusion model on the SE(3) group for 6D object pose estimation, addressing pose ambiguity stemming from symmetries and occlusions by jointly modeling rotation and translation. It develops a score-based framework on Lie groups with a surrogate Stein score to improve convergence and efficiency, demonstrated on synthetic SYMSOL/SYMSOL-T data and the real-world T-LESS dataset. The approach learns multi-modal pose distributions without symmetry annotations or 3D models, achieving competitive rotation accuracy and real-time inference. Ablation studies validate the surrogate-score's advantages, the benefits of SE(3) parametrization over $R^3SO(3)$, and the robustness to perspective-induced ambiguity. Overall, the method offers a principled, scalable strategy for robust 6D pose estimation in challenging visual conditions.
Abstract
Addressing pose ambiguity in 6D object pose estimation from single RGB images presents a significant challenge, particularly due to object symmetries or occlusions. In response, we introduce a novel score-based diffusion method applied to the $SE(3)$ group, marking the first application of diffusion models to $SE(3)$ within the image domain, specifically tailored for pose estimation tasks. Extensive evaluations demonstrate the method's efficacy in handling pose ambiguity, mitigating perspective-induced ambiguity, and showcasing the robustness of our surrogate Stein score formulation on $SE(3)$. This formulation not only improves the convergence of denoising process but also enhances computational efficiency. Thus, we pioneer a promising strategy for 6D object pose estimation.
