Table of Contents
Fetching ...

ZeroPose: CAD-Prompted Zero-shot Object 6D Pose Estimation in Cluttered Scenes

Jianqiu Chen, Zikun Zhou, Mingshan Sun, Tianpeng Bao, Rui Zhao, Liwei Wu, Zhenyu He

TL;DR

Experimental results on the seven datasets show that ZeroPose as a zero-shot method achieves comparable performance with object-specific training methods and outperforms the state-of-the-art zero-shot method with 50x inference speed improvement.

Abstract

Many robotics and industry applications have a high demand for the capability to estimate the 6D pose of novel objects from the cluttered scene. However, existing classic pose estimation methods are object-specific, which can only handle the specific objects seen during training. When applied to a novel object, these methods necessitate a cumbersome onboarding process, which involves extensive dataset preparation and model retraining. The extensive duration and resource consumption of onboarding limit their practicality in real-world applications. In this paper, we introduce ZeroPose, a novel zero-shot framework that performs pose estimation following a Discovery-Orientation-Registration (DOR) inference pipeline. This framework generalizes to novel objects without requiring model retraining. Given the CAD model of a novel object, ZeroPose enables in seconds onboarding time to extract visual and geometric embeddings from the CAD model as a prompt. With the prompting of the above embeddings, DOR can discover all related instances and estimate their 6D poses without additional human interaction or presupposing scene conditions. Compared with existing zero-shot methods solved by the render-and-compare paradigm, the DOR pipeline formulates the object pose estimation into a feature-matching problem, which avoids time-consuming online rendering and improves efficiency. Experimental results on the seven datasets show that ZeroPose as a zero-shot method achieves comparable performance with object-specific training methods and outperforms the state-of-the-art zero-shot method with 50x inference speed improvement.

ZeroPose: CAD-Prompted Zero-shot Object 6D Pose Estimation in Cluttered Scenes

TL;DR

Experimental results on the seven datasets show that ZeroPose as a zero-shot method achieves comparable performance with object-specific training methods and outperforms the state-of-the-art zero-shot method with 50x inference speed improvement.

Abstract

Many robotics and industry applications have a high demand for the capability to estimate the 6D pose of novel objects from the cluttered scene. However, existing classic pose estimation methods are object-specific, which can only handle the specific objects seen during training. When applied to a novel object, these methods necessitate a cumbersome onboarding process, which involves extensive dataset preparation and model retraining. The extensive duration and resource consumption of onboarding limit their practicality in real-world applications. In this paper, we introduce ZeroPose, a novel zero-shot framework that performs pose estimation following a Discovery-Orientation-Registration (DOR) inference pipeline. This framework generalizes to novel objects without requiring model retraining. Given the CAD model of a novel object, ZeroPose enables in seconds onboarding time to extract visual and geometric embeddings from the CAD model as a prompt. With the prompting of the above embeddings, DOR can discover all related instances and estimate their 6D poses without additional human interaction or presupposing scene conditions. Compared with existing zero-shot methods solved by the render-and-compare paradigm, the DOR pipeline formulates the object pose estimation into a feature-matching problem, which avoids time-consuming online rendering and improves efficiency. Experimental results on the seven datasets show that ZeroPose as a zero-shot method achieves comparable performance with object-specific training methods and outperforms the state-of-the-art zero-shot method with 50x inference speed improvement.
Paper Structure (19 sections, 8 equations, 10 figures, 8 tables)

This paper contains 19 sections, 8 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: (a) The classic pose estimation method applied on a novel object needs a cumbersome onboarding process for preparing a training dataset and retraining an object-specific model. The zero-shot pose estimation method adopts a pre-trained generalized model without model retraining for specific objects, reducing the onboarding time from days to tens of seconds. (b) With the CAD model prompting, the proposed ZeroPose solves both object discovery and pose estimation in a zero-shot manner.
  • Figure 2: A high-level overview of ZeroPose. With the prompting from CAD models, ZeroPose enables both object discovery and pose estimation in a zero-shot manner following a Discovery-Orientation-Registration (DOR) inference pipeline.
  • Figure 3: An illustration of ZeroPose. ZeroPose begins with extracting embeddings from the CAD model as prompt. With the prompting, the Discovery-Orientation-Registration (DOR) inference pipeline achieves pose estimation at three inference steps. The Discovery step aims for object discovery from the cluttered scene. It calculates the image embedding cosine similarity between foreground instances from scene segment anything results and CAD model prompts and associates them based on the cosine similarity score. The Orientation step is to estimate the camera observation viewpoint to find the points of the CAD model corresponding to the visible instance points. It is based on the discovery results to index related CAD model patch embedding and estimates the camera observation viewpoint by the nearest neighbor patch embedding feature distance. The Registration step solves the pose transformation by geometric embedding matching between the instance point clouds and the filtered CAD model point clouds.
  • Figure 4: Left: The onboarding stage aims to extract visual and geometric embeddings from the CAD model as the prompt of the target object, which is offline and only requires running once for each object. Right: Illustration for the Geo Model.
  • Figure 5: An illustration of the discovery step of ZeroPose.
  • ...and 5 more figures