Table of Contents
Fetching ...

Any6D: Model-free 6D Pose Estimation of Novel Objects

Taeyeop Lee, Bowen Wen, Minjun Kang, Gyuree Kang, In So Kweon, Kuk-Jin Yoon

TL;DR

Any6D tackles the problem of 6D pose estimation for novel objects without relying on textured CAD models or multiview references. It introduces a two-stage object alignment pipeline that reconstructs a normalized shape from a single RGB-D anchor, then estimates metric-scale size and pose, followed by refinement and a render-and-compare strategy to predict the relative pose to a query image. On five real-world datasets, Any6D achieves state-of-the-art results across standard pose metrics, demonstrating strong generalization to unseen objects, occlusions, and cross-environment variations. This approach reduces dependence on detailed object data and facilitates robust manipulation and augmented reality applications in real-world settings.

Abstract

We introduce Any6D, a model-free framework for 6D object pose estimation that requires only a single RGB-D anchor image to estimate both the 6D pose and size of unknown objects in novel scenes. Unlike existing methods that rely on textured 3D models or multiple viewpoints, Any6D leverages a joint object alignment process to enhance 2D-3D alignment and metric scale estimation for improved pose accuracy. Our approach integrates a render-and-compare strategy to generate and refine pose hypotheses, enabling robust performance in scenarios with occlusions, non-overlapping views, diverse lighting conditions, and large cross-environment variations. We evaluate our method on five challenging datasets: REAL275, Toyota-Light, HO3D, YCBINEOAT, and LM-O, demonstrating its effectiveness in significantly outperforming state-of-the-art methods for novel object pose estimation. Project page: https://taeyeop.com/any6d

Any6D: Model-free 6D Pose Estimation of Novel Objects

TL;DR

Any6D tackles the problem of 6D pose estimation for novel objects without relying on textured CAD models or multiview references. It introduces a two-stage object alignment pipeline that reconstructs a normalized shape from a single RGB-D anchor, then estimates metric-scale size and pose, followed by refinement and a render-and-compare strategy to predict the relative pose to a query image. On five real-world datasets, Any6D achieves state-of-the-art results across standard pose metrics, demonstrating strong generalization to unseen objects, occlusions, and cross-environment variations. This approach reduces dependence on detailed object data and facilitates robust manipulation and augmented reality applications in real-world settings.

Abstract

We introduce Any6D, a model-free framework for 6D object pose estimation that requires only a single RGB-D anchor image to estimate both the 6D pose and size of unknown objects in novel scenes. Unlike existing methods that rely on textured 3D models or multiple viewpoints, Any6D leverages a joint object alignment process to enhance 2D-3D alignment and metric scale estimation for improved pose accuracy. Our approach integrates a render-and-compare strategy to generate and refine pose hypotheses, enabling robust performance in scenarios with occlusions, non-overlapping views, diverse lighting conditions, and large cross-environment variations. We evaluate our method on five challenging datasets: REAL275, Toyota-Light, HO3D, YCBINEOAT, and LM-O, demonstrating its effectiveness in significantly outperforming state-of-the-art methods for novel object pose estimation. Project page: https://taeyeop.com/any6d

Paper Structure

This paper contains 14 sections, 1 equation, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Our method accurately estimates 6D object pose for novel objects on drastically different scenes and viewpoints using only a single RGB-D anchor image. We achieve robust pose estimation without requiring precise CAD models or posed multi-view reference images.
  • Figure 2: Overview of the Any6D framework for model-free object pose estimation. First, we reconstruct normalized object shape $O_N$ from the image-to-3D model. Then, we estimate accurate object pose and size from anchor image $I_A$ using the proposed object alignment (Sec. \ref{['subsec:object_alignemt']}). Next, we use the query image $I_Q$ to estimate the pose with the reconstructed metric-scale object shape $O_M$ (Sec. \ref{['subsec:pose_estimation']}).
  • Figure 3: Visualization of each point clouds and center of mustard object.
  • Figure 4: Qualitative comparison of state-of-the-art methods on the HO3D Dataset. In this challenging scenario, the left anchor image shows only partially visible objects, while the query images are not visible due to occlusion or different viewing angles. This represents the most challenging case for matching. Gedi, being a depth-based method, shows ambiguity when dealing with RGB-based non-symmetric objects.
  • Figure 5: Qualitative comparison of state-of-the-art methods on the YCBInEOAT Dataset. In this challenging scenario, the left anchor image shows only partially visible objects, while the query images are not visible due to occlusion or different viewing angles. This represents the most challenging case for matching. Gedi, being a depth-based method, shows ambiguity when dealing with RGB-based non-symmetric objects.
  • ...and 1 more figures