Generative 6D Pose Estimation via Conditional Flow Matching
Amir Hamza, Davide Boscaini, Weihang Li, Benjamin Busam, Fabio Poiesi
TL;DR
This work tackles instance-level 6D pose estimation under challenging conditions like object symmetries and occlusions. It reframes the problem as conditional flow matching in $\,mathbb{R}^3$ and introduces Flose, a three-stage pipeline that fuses overlap-aware geometry with semantic features from a Vision Foundation Model (DINOv2) to condition a denoising flow, followed by RANSAC-based registration and ICP refinement. The approach achieves state-of-the-art AR gains on five BOP datasets, including a notable +4.5 AR improvement over strong per-dataset baselines, while reducing training and inference costs compared to per-object models. By coupling appearance cues with robust outlier filtering, Flose demonstrates improved robustness to symmetries and occlusions and offers a controllable accuracy–efficiency trade-off via the number of denoising steps.
Abstract
Existing methods for instance-level 6D pose estimation typically rely on neural networks that either directly regress the pose in $\mathrm{SE}(3)$ or estimate it indirectly via local feature matching. The former struggle with object symmetries, while the latter fail in the absence of distinctive local features. To overcome these limitations, we propose a novel formulation of 6D pose estimation as a conditional flow matching problem in $\mathbb{R}^3$. We introduce Flose, a generative method that infers object poses via a denoising process conditioned on local features. While prior approaches based on conditional flow matching perform denoising solely based on geometric guidance, Flose integrates appearance-based semantic features to mitigate ambiguities caused by object symmetries. We further incorporate RANSAC-based registration to handle outliers. We validate Flose on five datasets from the established BOP benchmark. Flose outperforms prior methods with an average improvement of +4.5 Average Recall. Project Website : https://tev-fbk.github.io/Flose/
