OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation
Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Jiawei Ren, Liang Pan, Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, Dahua Lin, Ziwei Liu
TL;DR
OmniObject3D addresses the scarcity of large-scale real-scanned 3D datasets by introducing 6,000 real-world textured meshes across 190 categories, with rich annotations including point clouds, multi-view images, and videos. The dataset enables four evaluation tracks—robust 3D perception, novel-view synthesis, neural surface reconstruction, and 3D object generation—and is demonstrated through extensive experiments that probe robustness to OOD styles and corruptions, cross-scene generalization for NeRF-based methods, sparse-view reconstruction, and large-vocabulary generation dynamics. Key findings reveal gaps in current robustness and reconstruction approaches, highlight generalizable priors from cross-scene data, and uncover generation biases and trade-offs in a broad category space. Overall, OmniObject3D provides a versatile, high-fidelity benchmark and data resource with significant potential to advance realistic 3D perception, reconstruction, and generation in real-world settings.
Abstract
Recent advances in modeling 3D objects mostly rely on synthetic datasets due to the lack of large-scale realscanned 3D databases. To facilitate the development of 3D perception, reconstruction, and generation in the real world, we propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects. OmniObject3D has several appealing properties: 1) Large Vocabulary: It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets (e.g., ImageNet and LVIS), benefiting the pursuit of generalizable 3D representations. 2) Rich Annotations: Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos. 3) Realistic Scans: The professional scanners support highquality object scans with precise shapes and realistic appearances. With the vast exploration space offered by OmniObject3D, we carefully set up four evaluation tracks: a) robust 3D perception, b) novel-view synthesis, c) neural surface reconstruction, and d) 3D object generation. Extensive studies are performed on these four benchmarks, revealing new observations, challenges, and opportunities for future research in realistic 3D vision.
