3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition
Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, Mohamed Elhoseiny
TL;DR
3DCoMPaT++ introduces a large-scale multimodal 2D/3D dataset of 10M stylized shapes across 42 categories, with 275 fine-grained parts, 43 coarse-grained parts, and 293 materials, rendered to yield 160M views. It provides rich part- and material-level annotations in both 2D and 3D, along with texture coordinates and a comprehensive rendering pipeline, enabling the Grounded CoMPaT Recognition (GCR) task that jointly identifies shape categories and their part-material compositions. The paper validates the dataset through 2D/3D classification and segmentation experiments and presents a CVPR 2023 challenge on GCR, offering baselines (e.g., PointNet++, BPNet, SegFormer) and insights into multimodal grounding, while also releasing a toolbox for data access and visualization. This resource advances compositional 3D vision by enabling scalable, multi-view, multimodal analysis of material-part interactions and providing a benchmark for grounding complex shape formulations across modalities.
Abstract
In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT$^{++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at CVPR2023, showcasing the winning method's utilization of a modified PointNet$^{++}$ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision.
