CPPF++: Uncertainty-Aware Sim2Real Object Pose Estimation by Vote Aggregation
Yang You, Wenhao He, Jin Liu, Hongkai Xiong, Weiming Wang, Cewu Lu
TL;DR
CPPF++ advances sim-to-real 6D pose estimation by recasting point-pair voting as a probabilistic process in canonical space to handle vote collisions. It introduces N-point tuples to enrich context, a robust noisy-pair filtering scheme, and an online alignment optimization to refine poses at inference, all fed by purely synthetic training without backgrounds. The approach achieves substantial improvements over prior sim-to-real methods on NOCS REAL275 and generalizes well to unseen datasets, while also delivering competitive performance against methods trained on real data. The DiversePose 300 dataset further provides a challenging benchmark with diverse poses and backgrounds, highlighting GPT-style generalization benefits for category-level pose estimation in real-world-like conditions.
Abstract
Object pose estimation constitutes a critical area within the domain of 3D vision. While contemporary state-of-the-art methods that leverage real-world pose annotations have demonstrated commendable performance, the procurement of such real training data incurs substantial costs. This paper focuses on a specific setting wherein only 3D CAD models are utilized as a priori knowledge, devoid of any background or clutter information. We introduce a novel method, CPPF++, designed for sim-to-real pose estimation. This method builds upon the foundational point-pair voting scheme of CPPF, reformulating it through a probabilistic view. To address the challenge posed by vote collision, we propose a novel approach that involves modeling the voting uncertainty by estimating the probabilistic distribution of each point pair within the canonical space. Furthermore, we augment the contextual information provided by each voting unit through the introduction of N-point tuples. To enhance the robustness and accuracy of the model, we incorporate several innovative modules, including noisy pair filtering, online alignment optimization, and a tuple feature ensemble. Alongside these methodological advancements, we introduce a new category-level pose estimation dataset, named DiversePose 300. Empirical evidence demonstrates that our method significantly surpasses previous sim-to-real approaches and achieves comparable or superior performance on novel datasets. Our code is available on https://github.com/qq456cvb/CPPF2.
