OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

Tong Wu; Jiarui Zhang; Xiao Fu; Yuxin Wang; Jiawei Ren; Liang Pan; Wayne Wu; Lei Yang; Jiaqi Wang; Chen Qian; Dahua Lin; Ziwei Liu

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Jiawei Ren, Liang Pan, Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, Dahua Lin, Ziwei Liu

TL;DR

OmniObject3D addresses the scarcity of large-scale real-scanned 3D datasets by introducing 6,000 real-world textured meshes across 190 categories, with rich annotations including point clouds, multi-view images, and videos. The dataset enables four evaluation tracks—robust 3D perception, novel-view synthesis, neural surface reconstruction, and 3D object generation—and is demonstrated through extensive experiments that probe robustness to OOD styles and corruptions, cross-scene generalization for NeRF-based methods, sparse-view reconstruction, and large-vocabulary generation dynamics. Key findings reveal gaps in current robustness and reconstruction approaches, highlight generalizable priors from cross-scene data, and uncover generation biases and trade-offs in a broad category space. Overall, OmniObject3D provides a versatile, high-fidelity benchmark and data resource with significant potential to advance realistic 3D perception, reconstruction, and generation in real-world settings.

Abstract

Recent advances in modeling 3D objects mostly rely on synthetic datasets due to the lack of large-scale realscanned 3D databases. To facilitate the development of 3D perception, reconstruction, and generation in the real world, we propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects. OmniObject3D has several appealing properties: 1) Large Vocabulary: It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets (e.g., ImageNet and LVIS), benefiting the pursuit of generalizable 3D representations. 2) Rich Annotations: Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos. 3) Realistic Scans: The professional scanners support highquality object scans with precise shapes and realistic appearances. With the vast exploration space offered by OmniObject3D, we carefully set up four evaluation tracks: a) robust 3D perception, b) novel-view synthesis, c) neural surface reconstruction, and d) 3D object generation. Extensive studies are performed on these four benchmarks, revealing new observations, challenges, and opportunities for future research in realistic 3D vision.

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

TL;DR

Abstract

Paper Structure (22 sections, 21 figures, 12 tables)

This paper contains 22 sections, 21 figures, 12 tables.

Introduction
Related Works
The OmniObject3D Dataset
Data Collection, Processing, and Annotation
Statistics and Distribution
Experiments
Robust 3D Perception
Novel View Synthesis
Neural Surface Reconstruction
3D Object Generation
Conclusion and Outlook
Additional Information of OmniObject3D
Related Works
Additional Experimental Results
Robust 3D Perception
...and 7 more sections

Figures (21)

Figure 1: OmniObject3D is a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects and rich annotations. It supports various research topics, e.g., perception, novel view synthesis, neural surface reconstruction, and 3D generation.
Figure 2: Semantic distribution of the OmniObject3D dataset. It covers 190 daily categories with a long-tailed distribution, sharing common classes with popular 2D and 3D datasets.
Figure 3: OmniObject3D provides the first clean real-world point cloud object dataset and allows fine-grained analysis on robustness to OOD styles and OOD corruptions."-C": corrupted by common corruptions described in ren2022modelnet-c
Figure 4: Neural surface reconstruction results for both dense-view and sparse-view settings.
Figure 5: Performance distribution of dense-view surface reconstruction. The averaged results of the three methods is imbalanced. The colored area denotes a smoothed range of results.
...and 16 more figures

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

TL;DR

Abstract

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (21)