Table of Contents
Fetching ...

Pre-training with 3D Synthetic Data: Learning 3D Point Cloud Instance Segmentation from 3D Synthetic Scenes

Daichi Otsuka, Shinichi Mae, Ryosuke Yamada, Hirokatsu Kataoka

TL;DR

The paper tackles the data bottleneck in 3D point cloud instance segmentation by introducing a 3D synthetic data pre-training approach that uses a single generative model, Point-E, to augment ScanNetV2 scenes. The authors pre-train a Mask3D model on this expanded synthetic-augmented dataset and fine-tune on S3DIS, showing substantial performance gains over training from scratch and over plain ScanNetV2 pre-training, with additional gains from even minimal object-instance expansion. They demonstrate that inserting up to two Point-E generated objects per scene yields measurable improvements, including notable boosts for small objects. The work highlights the viability of 3D text-to-3D generation for scalable pre-training in 3D perception, hinting at further improvements with newer generative models and broader synthetic-data strategies to reduce real-data requirements in robotics and autonomous systems.

Abstract

In the recent years, the research community has witnessed growing use of 3D point cloud data for the high applicability in various real-world applications. By means of 3D point cloud, this modality enables to consider the actual size and spatial understanding. The applied fields include mechanical control of robots, vehicles, or other real-world systems. Along this line, we would like to improve 3D point cloud instance segmentation which has emerged as a particularly promising approach for these applications. However, the creation of 3D point cloud datasets entails enormous costs compared to 2D image datasets. To train a model of 3D point cloud instance segmentation, it is necessary not only to assign categories but also to provide detailed annotations for each point in the large-scale 3D space. Meanwhile, the increase of recent proposals for generative models in 3D domain has spurred proposals for using a generative model to create 3D point cloud data. In this work, we propose a pre-training with 3D synthetic data to train a 3D point cloud instance segmentation model based on generative model for 3D scenes represented by point cloud data. We directly generate 3D point cloud data with Point-E for inserting a generated data into a 3D scene. More recently in 2025, although there are other accurate 3D generation models, even using the Point-E as an early 3D generative model can effectively support the pre-training with 3D synthetic data. In the experimental section, we compare our pre-training method with baseline methods indicated improved performance, demonstrating the efficacy of 3D generative models for 3D point cloud instance segmentation.

Pre-training with 3D Synthetic Data: Learning 3D Point Cloud Instance Segmentation from 3D Synthetic Scenes

TL;DR

The paper tackles the data bottleneck in 3D point cloud instance segmentation by introducing a 3D synthetic data pre-training approach that uses a single generative model, Point-E, to augment ScanNetV2 scenes. The authors pre-train a Mask3D model on this expanded synthetic-augmented dataset and fine-tune on S3DIS, showing substantial performance gains over training from scratch and over plain ScanNetV2 pre-training, with additional gains from even minimal object-instance expansion. They demonstrate that inserting up to two Point-E generated objects per scene yields measurable improvements, including notable boosts for small objects. The work highlights the viability of 3D text-to-3D generation for scalable pre-training in 3D perception, hinting at further improvements with newer generative models and broader synthetic-data strategies to reduce real-data requirements in robotics and autonomous systems.

Abstract

In the recent years, the research community has witnessed growing use of 3D point cloud data for the high applicability in various real-world applications. By means of 3D point cloud, this modality enables to consider the actual size and spatial understanding. The applied fields include mechanical control of robots, vehicles, or other real-world systems. Along this line, we would like to improve 3D point cloud instance segmentation which has emerged as a particularly promising approach for these applications. However, the creation of 3D point cloud datasets entails enormous costs compared to 2D image datasets. To train a model of 3D point cloud instance segmentation, it is necessary not only to assign categories but also to provide detailed annotations for each point in the large-scale 3D space. Meanwhile, the increase of recent proposals for generative models in 3D domain has spurred proposals for using a generative model to create 3D point cloud data. In this work, we propose a pre-training with 3D synthetic data to train a 3D point cloud instance segmentation model based on generative model for 3D scenes represented by point cloud data. We directly generate 3D point cloud data with Point-E for inserting a generated data into a 3D scene. More recently in 2025, although there are other accurate 3D generation models, even using the Point-E as an early 3D generative model can effectively support the pre-training with 3D synthetic data. In the experimental section, we compare our pre-training method with baseline methods indicated improved performance, demonstrating the efficacy of 3D generative models for 3D point cloud instance segmentation.

Paper Structure

This paper contains 8 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: 3D scene expansion utilising a single 3D generative model. In the present paper, we employ Point-E as a 3D object generative model, to automatically synthesize 3D object instances. By placing generated 3D object instances in existing 3D scene data (e.g., ScanNetV2), the 3D scene data is effectively extended with generated 3D object instances. In the 3D generative pre-training pipeline, we use Point-E to generate 3D object instances and expand ScanNetV2 by randomly placing them into 3D scenes. The center of gravity coordinates of the 3D objects and scenes are aligned during insertion. Then random noise is added to the coordinates of the center of gravity of the 3D objects. In the pre-training phase, we assign Mask3D as a 3D point cloud instance segmentation model and pre-train the 3D point dloud instance segmentation model on the expanded 3D scene dataset. We additionally fine-tune the pre-trained Mask3D model on S3DIS dataset. We successfully show the effectiveness of '3D generative pre-training', a pre-training method for 3D point cloud instance segmentation from a single generative model.