Particle-based 6D Object Pose Estimation from Point Clouds using Diffusion Models
Christian Möller, Niklas Funk, Jan Peters
TL;DR
This work introduces a diffusion-based approach for 6D object pose estimation from a single depth view using point clouds. By embedding scene and object observations in an SE({3})-equivariant latent space via VectorNeurons, it generates multiple pose hypotheses and employs two lightweight, training-free particle-selection strategies to select a single pose. Partial rendering and SE({3})-equivariant latent representations are shown to improve accuracy and inference speed, enabling effective handling of occlusions and object symmetries on the Linemod dataset. The method achieves competitive accuracy with notable efficiency gains, particularly when rendering is performed every few iterations, making it practical for real-time or near-real-time deployment in 3D vision tasks.
Abstract
Object pose estimation from a single view remains a challenging problem. In particular, partial observability, occlusions, and object symmetries eventually result in pose ambiguity. To account for this multimodality, this work proposes training a diffusion-based generative model for 6D object pose estimation. During inference, the trained generative model allows for sampling multiple particles, i.e., pose hypotheses. To distill this information into a single pose estimate, we propose two novel and effective pose selection strategies that do not require any additional training or computationally intensive operations. Moreover, while many existing methods for pose estimation primarily focus on the image domain and only incorporate depth information for final pose refinement, our model solely operates on point cloud data. The model thereby leverages recent advancements in point cloud processing and operates upon an SE(3)-equivariant latent space that forms the basis for the particle selection strategies and allows for improved inference times. Our thorough experimental results demonstrate the competitive performance of our approach on the Linemod dataset and showcase the effectiveness of our design choices. Code is available at https://github.com/zitronian/6DPoseDiffusion .
