Pre-training with 3D Synthetic Data: Learning 3D Point Cloud Instance Segmentation from 3D Synthetic Scenes
Daichi Otsuka, Shinichi Mae, Ryosuke Yamada, Hirokatsu Kataoka
TL;DR
The paper tackles the data bottleneck in 3D point cloud instance segmentation by introducing a 3D synthetic data pre-training approach that uses a single generative model, Point-E, to augment ScanNetV2 scenes. The authors pre-train a Mask3D model on this expanded synthetic-augmented dataset and fine-tune on S3DIS, showing substantial performance gains over training from scratch and over plain ScanNetV2 pre-training, with additional gains from even minimal object-instance expansion. They demonstrate that inserting up to two Point-E generated objects per scene yields measurable improvements, including notable boosts for small objects. The work highlights the viability of 3D text-to-3D generation for scalable pre-training in 3D perception, hinting at further improvements with newer generative models and broader synthetic-data strategies to reduce real-data requirements in robotics and autonomous systems.
Abstract
In the recent years, the research community has witnessed growing use of 3D point cloud data for the high applicability in various real-world applications. By means of 3D point cloud, this modality enables to consider the actual size and spatial understanding. The applied fields include mechanical control of robots, vehicles, or other real-world systems. Along this line, we would like to improve 3D point cloud instance segmentation which has emerged as a particularly promising approach for these applications. However, the creation of 3D point cloud datasets entails enormous costs compared to 2D image datasets. To train a model of 3D point cloud instance segmentation, it is necessary not only to assign categories but also to provide detailed annotations for each point in the large-scale 3D space. Meanwhile, the increase of recent proposals for generative models in 3D domain has spurred proposals for using a generative model to create 3D point cloud data. In this work, we propose a pre-training with 3D synthetic data to train a 3D point cloud instance segmentation model based on generative model for 3D scenes represented by point cloud data. We directly generate 3D point cloud data with Point-E for inserting a generated data into a 3D scene. More recently in 2025, although there are other accurate 3D generation models, even using the Point-E as an early 3D generative model can effectively support the pre-training with 3D synthetic data. In the experimental section, we compare our pre-training method with baseline methods indicated improved performance, demonstrating the efficacy of 3D generative models for 3D point cloud instance segmentation.
