Table of Contents
Fetching ...

GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement

Linfang Zheng, Tze Ho Elden Tse, Chen Wang, Yinghan Sun, Hua Chen, Ales Leonardis, Wei Zhang

TL;DR

This work tackles category-level object pose refinement under substantial intra-class shape variation by introducing GeoReF, a refinement framework that fuses observed point clouds with category priors. Key innovations include a Hybrid-Scope (HS) feature extractor, learnable affine transformations (LAT) for adaptive alignment, and a cross-cloud transformation (CCT) mechanism to merge information from heterogeneous inputs; shape priors are also integrated to improve translation and size estimation. Extensive ablations and experiments on REAL275 and CAMERA25 demonstrate consistent, significant improvements over state-of-the-art baselines, including CATRE, particularly in scenarios with varying priors and initial estimations. The approach offers robust generalization, data-efficient learning, and a practical pathway toward more reliable category-level pose refinement in real-world applications.

Abstract

Object pose refinement is essential for robust object pose estimation. Previous work has made significant progress towards instance-level object pose refinement. Yet, category-level pose refinement is a more challenging problem due to large shape variations within a category and the discrepancies between the target object and the shape prior. To address these challenges, we introduce a novel architecture for category-level object pose refinement. Our approach integrates an HS-layer and learnable affine transformations, which aims to enhance the extraction and alignment of geometric information. Additionally, we introduce a cross-cloud transformation mechanism that efficiently merges diverse data sources. Finally, we push the limits of our model by incorporating the shape prior information for translation and size error prediction. We conducted extensive experiments to demonstrate the effectiveness of the proposed framework. Through extensive quantitative experiments, we demonstrate significant improvement over the baseline method by a large margin across all metrics.

GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement

TL;DR

This work tackles category-level object pose refinement under substantial intra-class shape variation by introducing GeoReF, a refinement framework that fuses observed point clouds with category priors. Key innovations include a Hybrid-Scope (HS) feature extractor, learnable affine transformations (LAT) for adaptive alignment, and a cross-cloud transformation (CCT) mechanism to merge information from heterogeneous inputs; shape priors are also integrated to improve translation and size estimation. Extensive ablations and experiments on REAL275 and CAMERA25 demonstrate consistent, significant improvements over state-of-the-art baselines, including CATRE, particularly in scenarios with varying priors and initial estimations. The approach offers robust generalization, data-efficient learning, and a practical pathway toward more reliable category-level pose refinement in real-world applications.

Abstract

Object pose refinement is essential for robust object pose estimation. Previous work has made significant progress towards instance-level object pose refinement. Yet, category-level pose refinement is a more challenging problem due to large shape variations within a category and the discrepancies between the target object and the shape prior. To address these challenges, we introduce a novel architecture for category-level object pose refinement. Our approach integrates an HS-layer and learnable affine transformations, which aims to enhance the extraction and alignment of geometric information. Additionally, we introduce a cross-cloud transformation mechanism that efficiently merges diverse data sources. Finally, we push the limits of our model by incorporating the shape prior information for translation and size error prediction. We conducted extensive experiments to demonstrate the effectiveness of the proposed framework. Through extensive quantitative experiments, we demonstrate significant improvement over the baseline method by a large margin across all metrics.
Paper Structure (32 sections, 4 equations, 5 figures, 5 tables)

This paper contains 32 sections, 4 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Illustration of the shape variation.SP-m represents the category's mean shape, SP-1 and SP-2 represents the randomly sampled object shapes from the CAMERA25 training set.
  • Figure 2: Overall structure of the proposed method. Our object pose refinement structure contains three main modules. Given the shape prior point cloud, the target object's observed point cloud, and the initial estimation, we first apply point cloud focalization on the input point clouds using the initial estimation. The focalized point clouds then go through a geometric-based feature extraction encoder to obtain geometric structural features. The extracted features are then fed into two branches for rotation error estimation, translation error, and size error estimation. Within the HS Feature Extractor, the Matrix Net models output the learnable affine transformations (LATs) for adaptive point and feature adjustment. The left output of the Matrix Net adjusts the input point clouds, while the right Matrix Net model outputs two affine transformations for adjusting the rotation features, and the translation and size features.
  • Figure 3: Performance comparison of our method and CATRE under different shape priors.SP-m denotes the category's mean shape, while SP-1 and SP-2 are randomly sampled object shapes from CAMERA25.
  • Figure 4: Qualitative comparison of proposed (row #3) and baseline (row #2) methods using SPD (row #1) as initial estimation. Ground truth shown with white lines. Note that the estimated rotations of symmetric objects (e.g. bowl, bottle, and can) are considered correct if the symmetry axis is aligned.
  • Figure 5: Comparison of proposed (row #2) and baseline (row #1) methods) during a complete refinement iteration, both using SPD as initial estimation. The ground truth is represented by white lines.