GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation
Zinqin Huang, Gu Wang, Chenyangguang Zhang, Ruida Zhang, Xiu Li, Xiangyang Ji
TL;DR
GIVEPose tackles RGB-based category-level object pose estimation by addressing intra-class variation that arises when regressing pose from the NOCS map. It introduces the Intra-class Variation-Free Consensus (IVFC) map, derived from a category-consensus model, and a Deformable Convolutional Auto-Encoder (DCAE) that gradually eliminates instance-specific information from the NOCS map to produce the IVFC map. The pose is then regressed from the IVFC map combined with 2D ROI information, while object size is inferred from backbone features, enabling end-to-end RGB-only category-level pose estimation. Evaluations on CAMERA25, REAL275, and Wild6D demonstrate substantial improvements over prior RGB-based methods, with code released to support reproducibility, and the approach offers robust handling of intra-class variation and truncation in real-world scenarios.
Abstract
Recent advances in RGBD-based category-level object pose estimation have been limited by their reliance on precise depth information, restricting their broader applicability. In response, RGB-based methods have been developed. Among these methods, geometry-guided pose regression that originated from instance-level tasks has demonstrated strong performance. However, we argue that the NOCS map is an inadequate intermediate representation for geometry-guided pose regression method, as its many-to-one correspondence with category-level pose introduces redundant instance-specific information, resulting in suboptimal results. This paper identifies the intra-class variation problem inherent in pose regression based solely on the NOCS map and proposes the Intra-class Variation-Free Consensus (IVFC) map, a novel coordinate representation generated from the category-level consensus model. By leveraging the complementary strengths of the NOCS map and the IVFC map, we introduce GIVEPose, a framework that implements Gradual Intra-class Variation Elimination for category-level object pose estimation. Extensive evaluations on both synthetic and real-world datasets demonstrate that GIVEPose significantly outperforms existing state-of-the-art RGB-based approaches, achieving substantial improvements in category-level object pose estimation. Our code is available at https://github.com/ziqin-h/GIVEPose.
