SE(3)-PoseFlow: Estimating 6D Pose Distributions for Uncertainty-Aware Robotic Manipulation
Yufeng Jin, Niklas Funk, Vignesh Prasad, Zechu Li, Mathias Franzius, Jan Peters, Georgia Chalvatzaki
TL;DR
This work tackles the challenge of uncertain, multi-modal 6D object pose estimation under occlusions and symmetry by modeling full pose distributions with SE(3) flow matching. The authors introduce a probabilistic pipeline that combines dual-stream RGB-D encoders, DiT* masked cross-attention, and SE(3) velocity-field regression to sample multiple pose hypotheses $p(R,p \mid O,I)$. They propose two pose-selection schemes (model-free clustering and geometry-based re-ranking) and demonstrate how the learned distributions enable active perception and uncertainty-aware grasping in real robotic setups. Empirical results on REAL275, YCB-V, and LM-O show state-of-the-art or competitive performance for probabilistic pose estimation, with ablations highlighting the benefits of masking, RGB cues, and SDF-based scoring. The approach offers practical advantages for safe manipulation in cluttered and ambiguous environments, though future work is needed to scale to multi-object scenes and to integrate Bayesian inference over pose samples.
Abstract
Object pose estimation is a fundamental problem in robotics and computer vision, yet it remains challenging due to partial observability, occlusions, and object symmetries, which inevitably lead to pose ambiguity and multiple hypotheses consistent with the same observation. While deterministic deep networks achieve impressive performance under well-constrained conditions, they are often overconfident and fail to capture the multi-modality of the underlying pose distribution. To address these challenges, we propose a novel probabilistic framework that leverages flow matching on the SE(3) manifold for estimating 6D object pose distributions. Unlike existing methods that regress a single deterministic output, our approach models the full pose distribution with a sample-based estimate and enables reasoning about uncertainty in ambiguous cases such as symmetric objects or severe occlusions. We achieve state-of-the-art results on Real275, YCB-V, and LM-O, and demonstrate how our sample-based pose estimates can be leveraged in downstream robotic manipulation tasks such as active perception for disambiguating uncertain viewpoints or guiding grasp synthesis in an uncertainty-aware manner.
