Table of Contents
Fetching ...

3D-COCO: extension of MS-COCO dataset for image detection and 3D reconstruction modules

Maxence Bideaux, Alice Phe, Mohamed Chaouch, Bertrand Luvison, Quoc-Cuong Pham

TL;DR

3D-COCO tackles the lack of datasets linking 2D object detection with 3D CAD models by extending MS-COCO with a large, aligned set of 3D shapes from ShapeNet and Objaverse. It introduces an IoU-based automatic 2D-3D matching pipeline and renders 62 views per model to support both 2D detection and 3D reconstruction tasks. The result is a publicly available, open dataset of about 27,760 CAD models across 80 COCO classes, with alignments for hundreds of thousands of annotations, enabling 3D-configurable detection and multi-view or single-view reconstruction research. This dataset paves the way for integrating real images into 3D reconstruction pipelines and invites future improvements in alignment and model coverage.

Abstract

We introduce 3D-COCO, an extension of the original MS-COCO dataset providing 3D models and 2D-3D alignment annotations. 3D-COCO was designed to achieve computer vision tasks such as 3D reconstruction or image detection configurable with textual, 2D image, and 3D CAD model queries. We complete the existing MS-COCO dataset with 28K 3D models collected on ShapeNet and Objaverse. By using an IoU-based method, we match each MS-COCO annotation with the best 3D models to provide a 2D-3D alignment. The open-source nature of 3D-COCO is a premiere that should pave the way for new research on 3D-related topics. The dataset and its source codes is available at https://kalisteo.cea.fr/index.php/coco3d-object-detection-and-reconstruction/

3D-COCO: extension of MS-COCO dataset for image detection and 3D reconstruction modules

TL;DR

3D-COCO tackles the lack of datasets linking 2D object detection with 3D CAD models by extending MS-COCO with a large, aligned set of 3D shapes from ShapeNet and Objaverse. It introduces an IoU-based automatic 2D-3D matching pipeline and renders 62 views per model to support both 2D detection and 3D reconstruction tasks. The result is a publicly available, open dataset of about 27,760 CAD models across 80 COCO classes, with alignments for hundreds of thousands of annotations, enabling 3D-configurable detection and multi-view or single-view reconstruction research. This dataset paves the way for integrating real images into 3D reconstruction pipelines and invites future improvements in alignment and model coverage.

Abstract

We introduce 3D-COCO, an extension of the original MS-COCO dataset providing 3D models and 2D-3D alignment annotations. 3D-COCO was designed to achieve computer vision tasks such as 3D reconstruction or image detection configurable with textual, 2D image, and 3D CAD model queries. We complete the existing MS-COCO dataset with 28K 3D models collected on ShapeNet and Objaverse. By using an IoU-based method, we match each MS-COCO annotation with the best 3D models to provide a 2D-3D alignment. The open-source nature of 3D-COCO is a premiere that should pave the way for new research on 3D-related topics. The dataset and its source codes is available at https://kalisteo.cea.fr/index.php/coco3d-object-detection-and-reconstruction/
Paper Structure (8 sections, 4 figures, 2 tables)

This paper contains 8 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: 3D CAD models data collection from ShapeNet ShapeNet and Objaverse Objaverse (left) and pre-processing steps including centering, conversion (upper right), and 2D rendering (lower right). Textured renderings display the model with colors and textures, synthetic renderings display the model in a uniform gray color, depth map renderings display the nearest model points in darker colors and binary mask renderings display the silhouette of the model.
  • Figure 2: Matching between MS-COCO COCO 2D annotations and 3D models using our automatic class-driven retrieval method. First, the MS-COCO COCO annotation binary mask image is extracted from the image (left). Then, the annotation label is used to select the 3D models of the same class and their 62 binary masks (middle). Finally, IoU is calculated between the MS-COCO COCO annotation mask and each CAD binary mask: models with the highest IoU score are saved as the best matching models in the 3D-COCO annotation file
  • Figure 3: Example of difficult scenarios from COCO COCO images and annotations with their associated flag names
  • Figure 4: MS-COCO COCO images with Truck and Horse annotations followed by their 3 best matching models predicted by our IoU-based retrieval method