Table of Contents
Fetching ...

UnCommon Objects in 3D

Xingchen Liu, Piyush Tayal, Jianyuan Wang, Jesus Zarzar, Tom Monnier, Konstantinos Tertikas, Jiali Duan, Antoine Toisoul, Jason Y. Zhang, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny

TL;DR

UnCommon Objects in 3D (uCO3D) is a high-quality, real-world, object-centric 360-degree video dataset with extensive 3D annotations, captions, and Gaussian Splat reconstructions across over $1{,}000$ categories and $170{,}000$ scenes. It combines VGGSfM-based camera poses, depth maps, sparse/dense point clouds, and canonical-view 3DGS to support robust learning and re-rendering, validated through improved performance on few-view 3D reconstruction, novel-view diffusion, and text-to-3D tasks. The work demonstrates that training on uCO3D yields stronger models than training on prior datasets (MVImgNet, CO3Dv2) and enables re-shooting 3DGS from canonical views to adapt real data for Instant3D-like pipelines. This dataset thus provides a practical, scalable resource for real-world 3D deep learning and generative modelling with broad applicability in digital twinning and 3D content creation.

Abstract

We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset for 3D deep learning and 3D generative AI. uCO3D is the largest publicly-available collection of high-resolution videos of objects with 3D annotations that ensures full-360$^{\circ}$ coverage. uCO3D is significantly more diverse than MVImgNet and CO3Dv2, covering more than 1,000 object categories. It is also of higher quality, due to extensive quality checks of both the collected videos and the 3D annotations. Similar to analogous datasets, uCO3D contains annotations for 3D camera poses, depth maps and sparse point clouds. In addition, each object is equipped with a caption and a 3D Gaussian Splat reconstruction. We train several large 3D models on MVImgNet, CO3Dv2, and uCO3D and obtain superior results using the latter, showing that uCO3D is better for learning applications.

UnCommon Objects in 3D

TL;DR

UnCommon Objects in 3D (uCO3D) is a high-quality, real-world, object-centric 360-degree video dataset with extensive 3D annotations, captions, and Gaussian Splat reconstructions across over categories and scenes. It combines VGGSfM-based camera poses, depth maps, sparse/dense point clouds, and canonical-view 3DGS to support robust learning and re-rendering, validated through improved performance on few-view 3D reconstruction, novel-view diffusion, and text-to-3D tasks. The work demonstrates that training on uCO3D yields stronger models than training on prior datasets (MVImgNet, CO3Dv2) and enables re-shooting 3DGS from canonical views to adapt real data for Instant3D-like pipelines. This dataset thus provides a practical, scalable resource for real-world 3D deep learning and generative modelling with broad applicability in digital twinning and 3D content creation.

Abstract

We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset for 3D deep learning and 3D generative AI. uCO3D is the largest publicly-available collection of high-resolution videos of objects with 3D annotations that ensures full-360 coverage. uCO3D is significantly more diverse than MVImgNet and CO3Dv2, covering more than 1,000 object categories. It is also of higher quality, due to extensive quality checks of both the collected videos and the 3D annotations. Similar to analogous datasets, uCO3D contains annotations for 3D camera poses, depth maps and sparse point clouds. In addition, each object is equipped with a caption and a 3D Gaussian Splat reconstruction. We train several large 3D models on MVImgNet, CO3Dv2, and uCO3D and obtain superior results using the latter, showing that uCO3D is better for learning applications.
Paper Structure (30 sections, 8 figures, 4 tables)

This paper contains 30 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: We introduce UnCommon Objects in 3D (uCO3D), a large and diverse dataset of high-quality 360$^{\circ}$ videos covering over 1,000 object categories. Each video frame is 3D-annotated with accurate SfM cameras, point cloud, and a 3D Gaussian Splatting reconstruction.
  • Figure 2: Statistics of uCO3D. (Left) We plot the number of objects per super-category. In total, the dataset contains 50 super-categories, each gathering around 20 sub-categories. (Right) We show a word cloud of all 1,070 visual categories represented in the dataset.
  • Figure 3: Data annotation overview. Each scene in uCO3D is reconstructed in three different ways: a) per-frame cameras with sparse point cloud calculated by VGGSfM wang2024vggsfm, b) semi-dense point cloud comprising triangulations of additional denser tracks from VGGSfM's tracker, c) 3D Gaussian Splat kerbl233d-gaussian reconstruction optimized separately for each scene.
  • Figure 4: Data collection example. For each video, the cameras follow a sine-wave trajectory to ensure good viewpoint coverage.
  • Figure 5: 3D reconstruction comparison. We show results of LightplaneLRM cao2024lightplane models trained on MVImgNet, CO3Dv2 and uCO3D.
  • ...and 3 more figures