Co-op: Correspondence-based Novel Object Pose Estimation
Sungphill Moon, Hyeontae Son, Dongcheol Hur, Sangwook Kim
TL;DR
Co-op tackles unseen object 6DoF pose estimation from a single RGB image by learning correspondences between the input and a small set of pre-rendered templates, enabling robust generalization without per-object training. The method combines a coarse stage with semi-dense, patch-level classification and offset regression and a dense refinement stage that uses probabilistic flow and a differentiable PnP for precise pose updates in a render-and-compare loop. A pose-selection module can generate multiple hypotheses to further boost accuracy, and CroCo-pretrained transformers underpin the entire architecture. On the seven core BOP datasets, Co-op delivers state-of-the-art performance in RGB-only settings and maintains strong results with RGB-D inputs, demonstrating rapid, accurate, and robust pose estimation for unseen objects in cluttered scenes.
Abstract
We propose Co-op, a novel method for accurately and robustly estimating the 6DoF pose of objects unseen during training from a single RGB image. Our method requires only the CAD model of the target object and can precisely estimate its pose without any additional fine-tuning. While existing model-based methods suffer from inefficiency due to using a large number of templates, our method enables fast and accurate estimation with a small number of templates. This improvement is achieved by finding semi-dense correspondences between the input image and the pre-rendered templates. Our method achieves strong generalization performance by leveraging a hybrid representation that combines patch-level classification and offset regression. Additionally, our pose refinement model estimates probabilistic flow between the input image and the rendered image, refining the initial estimate to an accurate pose using a differentiable PnP layer. We demonstrate that our method not only estimates object poses rapidly but also outperforms existing methods by a large margin on the seven core datasets of the BOP Challenge, achieving state-of-the-art accuracy.
