Table of Contents
Fetching ...

Category-Agnostic Pose Estimation for Point Clouds

Bowen Liu, Wei Liu, Siang Chen, Pengwei Xie, Guijin Wang

TL;DR

The paper tackles the problem of generalizing 6D pose estimation to unseen object categories by introducing rotation-invariant patch features learned in a category-agnostic setting. It presents an end-to-end pipeline that combines PatchNet for patch estimation and a PointMLP-based backbone for pose regression, using a loss that balances pose reconstruction, patch accuracy, and symmetry handling. Key contributions include a semi-automatic patch annotation workflow, a rotation-invariant patch design, and a symmetry-aware loss, yielding competitive results on CAMERA25 and ModelNet40 without category information and demonstrating generalization to novel categories. This approach offers a practical pathway toward robust pose estimation in real-world, category-rich environments where category labels are unavailable or unreliable.

Abstract

The goal of object pose estimation is to visually determine the pose of a specific object in the RGB-D input. Unfortunately, when faced with new categories, both instance-based and category-based methods are unable to deal with unseen objects of unseen categories, which is a challenge for pose estimation. To address this issue, this paper proposes a method to introduce geometric features for pose estimation of point clouds without requiring category information. The method is based only on the patch feature of the point cloud, a geometric feature with rotation invariance. After training without category information, our method achieves as good results as other category-based methods. Our method successfully achieved pose annotation of no category information instances on the CAMERA25 dataset and ModelNet40 dataset.

Category-Agnostic Pose Estimation for Point Clouds

TL;DR

The paper tackles the problem of generalizing 6D pose estimation to unseen object categories by introducing rotation-invariant patch features learned in a category-agnostic setting. It presents an end-to-end pipeline that combines PatchNet for patch estimation and a PointMLP-based backbone for pose regression, using a loss that balances pose reconstruction, patch accuracy, and symmetry handling. Key contributions include a semi-automatic patch annotation workflow, a rotation-invariant patch design, and a symmetry-aware loss, yielding competitive results on CAMERA25 and ModelNet40 without category information and demonstrating generalization to novel categories. This approach offers a practical pathway toward robust pose estimation in real-world, category-rich environments where category labels are unavailable or unreliable.

Abstract

The goal of object pose estimation is to visually determine the pose of a specific object in the RGB-D input. Unfortunately, when faced with new categories, both instance-based and category-based methods are unable to deal with unseen objects of unseen categories, which is a challenge for pose estimation. To address this issue, this paper proposes a method to introduce geometric features for pose estimation of point clouds without requiring category information. The method is based only on the patch feature of the point cloud, a geometric feature with rotation invariance. After training without category information, our method achieves as good results as other category-based methods. Our method successfully achieved pose annotation of no category information instances on the CAMERA25 dataset and ModelNet40 dataset.
Paper Structure (17 sections, 3 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 3 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Most methods of pose estimation based on a single category are complicated and difficult to generalize to other categories. To address this issue, we propose a method based on geometric features for category-agnostic pose estimation.
  • Figure 2: Point cloud pose estimation pipeline (to enhance readability, we employ colored models instead of point clouds).: The randomly rotated point cloud $\mathbf Y$ is used as the input of the network. PatchNet is responsible for predicting the patch of the point cloud to generate geometric features, which are combined with the global point cloud and input into the network with PointMLP as the backbone. Finally, output $\Delta \mathbf q$ to predict and compensate 60 rotation modes of the icosahedron to get the corrected pose. The network measures the Loss between point clouds through chamfer distance.
  • Figure 3: The semi-automatic patch annotation process, where semi-automatic refers to some parameters being manually selected, may cause some differences in the results between different objects
  • Figure 4: The visualization of semi-automatic patch annotation process. To improve readability, we use the patch center to represent the patch area. The patch consistently appears at the top and bottom of the bottle category. This result was obtained under the conditions of $N=1024$, $M=20,$ and $th=10$.
  • Figure 5: The visualization of several categories of results in the CAMERA25 dataset, shows that in most cases, the results of pose estimation are satisfactory. On categories with significant geometric differences within, such as cameras, is challenging.