Table of Contents
Fetching ...

Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)

Tsu-Ching Hsiao, Hao-Wei Chen, Hsuan-Kung Yang, Chun-Yi Lee

TL;DR

This work introduces the first image-domain diffusion model on the SE(3) group for 6D object pose estimation, addressing pose ambiguity stemming from symmetries and occlusions by jointly modeling rotation and translation. It develops a score-based framework on Lie groups with a surrogate Stein score to improve convergence and efficiency, demonstrated on synthetic SYMSOL/SYMSOL-T data and the real-world T-LESS dataset. The approach learns multi-modal pose distributions without symmetry annotations or 3D models, achieving competitive rotation accuracy and real-time inference. Ablation studies validate the surrogate-score's advantages, the benefits of SE(3) parametrization over $R^3SO(3)$, and the robustness to perspective-induced ambiguity. Overall, the method offers a principled, scalable strategy for robust 6D pose estimation in challenging visual conditions.

Abstract

Addressing pose ambiguity in 6D object pose estimation from single RGB images presents a significant challenge, particularly due to object symmetries or occlusions. In response, we introduce a novel score-based diffusion method applied to the $SE(3)$ group, marking the first application of diffusion models to $SE(3)$ within the image domain, specifically tailored for pose estimation tasks. Extensive evaluations demonstrate the method's efficacy in handling pose ambiguity, mitigating perspective-induced ambiguity, and showcasing the robustness of our surrogate Stein score formulation on $SE(3)$. This formulation not only improves the convergence of denoising process but also enhances computational efficiency. Thus, we pioneer a promising strategy for 6D object pose estimation.

Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)

TL;DR

This work introduces the first image-domain diffusion model on the SE(3) group for 6D object pose estimation, addressing pose ambiguity stemming from symmetries and occlusions by jointly modeling rotation and translation. It develops a score-based framework on Lie groups with a surrogate Stein score to improve convergence and efficiency, demonstrated on synthetic SYMSOL/SYMSOL-T data and the real-world T-LESS dataset. The approach learns multi-modal pose distributions without symmetry annotations or 3D models, achieving competitive rotation accuracy and real-time inference. Ablation studies validate the surrogate-score's advantages, the benefits of SE(3) parametrization over , and the robustness to perspective-induced ambiguity. Overall, the method offers a principled, scalable strategy for robust 6D pose estimation in challenging visual conditions.

Abstract

Addressing pose ambiguity in 6D object pose estimation from single RGB images presents a significant challenge, particularly due to object symmetries or occlusions. In response, we introduce a novel score-based diffusion method applied to the group, marking the first application of diffusion models to within the image domain, specifically tailored for pose estimation tasks. Extensive evaluations demonstrate the method's efficacy in handling pose ambiguity, mitigating perspective-induced ambiguity, and showcasing the robustness of our surrogate Stein score formulation on . This formulation not only improves the convergence of denoising process but also enhances computational efficiency. Thus, we pioneer a promising strategy for 6D object pose estimation.
Paper Structure (51 sections, 33 equations, 8 figures, 12 tables, 2 algorithms)

This paper contains 51 sections, 33 equations, 8 figures, 12 tables, 2 algorithms.

Figures (8)

  • Figure 1: Visualization of the denoising process of our score-based diffusion method on $SE(3)$ for 6DoF pose estimation.
  • Figure 2: Left: Framework overview. Right: Visualization of a denoising step from a noisy sample $\tilde{X}$ to its cleaned counterpart $X$ on $SE(2)$. The contours are the distances to $X$ in 2D Euclidean space. Each line represents a denoising path with varying sub-sampling steps.
  • Figure 3: Visualization of our $SE(3)$ diffusion results on SYMSOL-T. Each plot contains $1,000$ sampled poses generated by our model. The first row depicts the densities of discrete symmetrical shapes: (a) tetrahedron, (b) cube, (c) icosahedron, each possessing 12, 24 and 60 discrete symmetries, respectively. The second row presents the densities of continuous symmetrical objects: (d) cone and (e) cylinder, with each shape exhibiting 1 and 2 continuous symmetries, respectively.
  • Figure 4: Visualization of our $SE(3)$ diffusion results on T-LESS. In the first row, we present our estimation results of three objects in cluttered scenes: (a) Object 9, characterized by 2 discrete symmetries; (b) Object 27, featuring 4 discrete symmetries; and (c) object 14, possessing 1 continuous symmetries. The second row illustrates pose ambiguities arising from occlusion and self-occlusion, particularly related to Object 4. Notably, this object is annotated with 1 continuous symmetry by human annotator, which does not accurately capture the true ambiguities in certain cases. We explore the scenarios where (d) the object has no symmetry if the top feature is visible; (e) 2 discrete symmetries when the feature is self-occluded, but revealing the two screw holes at the bottom; and (f) 1 continuous symmetry if the screw holes are also occluded by the scene. Each plot contains $1,000$ pose samples from our model. The samples are concentrated on each mode of the distribution, indicating that our models can generate precise rotation estimations across different objects.
  • Figure 5: Visualizing pose ambiguity caused by image perspective. The rotations between the four cubes differ by an angle of 15 degrees.
  • ...and 3 more figures