Table of Contents
Fetching ...

Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation

Mengchen Zhang, Tong Wu, Tai Wang, Tengfei Wang, Ziwei Liu, Dahua Lin

TL;DR

Omni6D addresses the limited category coverage of existing 6D pose datasets by introducing a large-vocabulary RGBD benchmark with 166 categories, 4688 instances, and over 0.8 million renders, enhanced by symmetry-aware evaluation and a canonical-pose annotation scheme. It benchmarks both implicit and explicit category-level pose methods across Omni6D and the extended Omni6D-xl, analyzes generalization to unseen categories, and demonstrates a practical fine-tuning strategy from limited categories. The authors also present Omni6D-Real to bridge sim-to-real, and show that real-scanned objects improve realism while enabling joint training and superior cross-domain transfer. Together with Omni6D-Real and Omni6D-xl, this dataset provides a comprehensive, realistic, and scalable platform for advancing large-vocabulary, category-level 6D pose estimation with strong implications for industry and research.

Abstract

6D object pose estimation aims at determining an object's translation, rotation, and scale, typically from a single RGBD image. Recent advancements have expanded this estimation from instance-level to category-level, allowing models to generalize across unseen instances within the same category. However, this generalization is limited by the narrow range of categories covered by existing datasets, such as NOCS, which also tend to overlook common real-world challenges like occlusion. To tackle these challenges, we introduce Omni6D, a comprehensive RGBD dataset featuring a wide range of categories and varied backgrounds, elevating the task to a more realistic context. 1) The dataset comprises an extensive spectrum of 166 categories, 4688 instances adjusted to the canonical pose, and over 0.8 million captures, significantly broadening the scope for evaluation. 2) We introduce a symmetry-aware metric and conduct systematic benchmarks of existing algorithms on Omni6D, offering a thorough exploration of new challenges and insights. 3) Additionally, we propose an effective fine-tuning approach that adapts models from previous datasets to our extensive vocabulary setting. We believe this initiative will pave the way for new insights and substantial progress in both the industrial and academic fields, pushing forward the boundaries of general 6D pose estimation.

Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation

TL;DR

Omni6D addresses the limited category coverage of existing 6D pose datasets by introducing a large-vocabulary RGBD benchmark with 166 categories, 4688 instances, and over 0.8 million renders, enhanced by symmetry-aware evaluation and a canonical-pose annotation scheme. It benchmarks both implicit and explicit category-level pose methods across Omni6D and the extended Omni6D-xl, analyzes generalization to unseen categories, and demonstrates a practical fine-tuning strategy from limited categories. The authors also present Omni6D-Real to bridge sim-to-real, and show that real-scanned objects improve realism while enabling joint training and superior cross-domain transfer. Together with Omni6D-Real and Omni6D-xl, this dataset provides a comprehensive, realistic, and scalable platform for advancing large-vocabulary, category-level 6D pose estimation with strong implications for industry and research.

Abstract

6D object pose estimation aims at determining an object's translation, rotation, and scale, typically from a single RGBD image. Recent advancements have expanded this estimation from instance-level to category-level, allowing models to generalize across unseen instances within the same category. However, this generalization is limited by the narrow range of categories covered by existing datasets, such as NOCS, which also tend to overlook common real-world challenges like occlusion. To tackle these challenges, we introduce Omni6D, a comprehensive RGBD dataset featuring a wide range of categories and varied backgrounds, elevating the task to a more realistic context. 1) The dataset comprises an extensive spectrum of 166 categories, 4688 instances adjusted to the canonical pose, and over 0.8 million captures, significantly broadening the scope for evaluation. 2) We introduce a symmetry-aware metric and conduct systematic benchmarks of existing algorithms on Omni6D, offering a thorough exploration of new challenges and insights. 3) Additionally, we propose an effective fine-tuning approach that adapts models from previous datasets to our extensive vocabulary setting. We believe this initiative will pave the way for new insights and substantial progress in both the industrial and academic fields, pushing forward the boundaries of general 6D pose estimation.
Paper Structure (30 sections, 1 equation, 18 figures, 17 tables, 1 algorithm)

This paper contains 30 sections, 1 equation, 18 figures, 17 tables, 1 algorithm.

Figures (18)

  • Figure 1: Symmetry statistics. The figure demonstrates different symmetry cases using object instances and provides a quantitative representation of the occurrence frequency for various combinations of distinct symmetry cases across the xyz-axes.
  • Figure 2: Omni6D analysis.(a) distribution of point cloud centroids, (b) distribution of object centroids on (top) normalized image, XY-plane, and (bottom) normalized depth, XZ-plane, (c) density of relative 2D object size, (d) density of angular deviation from the upward direction, (e) Omni6D dataset clustering results. The angle of each sector in the chart reflects the relative size of the instance count within that category.
  • Figure 3: Challenges from Omni6D.(a) Algorithms trained on Omni6D can overcome challenges in estimating poses for occluded object instances. The left shows an occluded object instance at the edge of the image, while the right image shows an object instance obstructed by other objects. (b) Algorithms trained on Omni6D can accurately estimate poses with only the lower half or bottom appearance of an object. The green and red colors respectively denote the ground truth and predicted 3D bounding boxes. The blue and orange lines on the boxes separately highlight the intersecting lines of the frontal face and the top face of the two 3D bounding boxes, while the darker lines indicate the bottom of the bounding boxes.
  • Figure 4: Category-Wise Performance on Omni6D Dataset. The x-axis, moving from left to right, sequentially represents: the number of objects within a category (Semantic Category), the number of objects within a cluster clustered based on shape priors (Shape Category) and the diversity of instances within a category. The y-axis depicts category or clustered group results for IoU$_{75}$ and $5^\circ2~cm$ metrics. Each plotted point illustrates the algorithm's result for a specific category or cluster, while the line showcases the trend of the linear fit for the scattered points.
  • Figure 5: Our finetune strategy.(a) Category inventory of cls$n$ within Omni6D dataset. The angle of each sector in the chart reflects the relative size of the instance count within that category. (b) In each fine-tuning step, we double the category count, copying trained global features and old category parameters into the new network while initializing the new category parameters. An observable deepening of color is indicative of the escalating count of training iterations.
  • ...and 13 more figures