Table of Contents
Fetching ...

Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation

Bram Vanherle, Brent Zoomers, Jeroen Put, Frank Van Reeth, Nick Michiels

TL;DR

This work tackles the domain-gap challenge in synthetic data for instance segmentation by introducing Cut-and-Splat, a pipeline that uses Gaussian Splatting to render foreground objects from a short video into contextually plausible backgrounds without textured 3D models. Foreground extraction, depth-guided placement on plausible background surfaces, and lighting augmentations produce realistic training images with corresponding bounding boxes and masks. Empirical evaluation on the IBSYD dataset shows that Cut-and-Splat outperforms Cut-and-Paste and diffusion-based data generation, with ablations confirming the importance of smart placement and appearance augmentation. The method provides a practical, automated route to high-quality, object-specific synthetic data, reducing annotation cost and enabling domain-specific training, while future work could address relighting, transparency, and multi-pose scenarios.

Abstract

Generating synthetic images is a useful method for cheaply obtaining labeled data for training computer vision models. However, obtaining accurate 3D models of relevant objects is necessary, and the resulting images often have a gap in realism due to challenges in simulating lighting effects and camera artifacts. We propose using the novel view synthesis method called Gaussian Splatting to address these challenges. We have developed a synthetic data pipeline for generating high-quality context-aware instance segmentation training data for specific objects. This process is fully automated, requiring only a video of the target object. We train a Gaussian Splatting model of the target object and automatically extract the object from the video. Leveraging Gaussian Splatting, we then render the object on a random background image, and monocular depth estimation is employed to place the object in a believable pose. We introduce a novel dataset to validate our approach and show superior performance over other data generation approaches, such as Cut-and-Paste and Diffusion model-based generation.

Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation

TL;DR

This work tackles the domain-gap challenge in synthetic data for instance segmentation by introducing Cut-and-Splat, a pipeline that uses Gaussian Splatting to render foreground objects from a short video into contextually plausible backgrounds without textured 3D models. Foreground extraction, depth-guided placement on plausible background surfaces, and lighting augmentations produce realistic training images with corresponding bounding boxes and masks. Empirical evaluation on the IBSYD dataset shows that Cut-and-Splat outperforms Cut-and-Paste and diffusion-based data generation, with ablations confirming the importance of smart placement and appearance augmentation. The method provides a practical, automated route to high-quality, object-specific synthetic data, reducing annotation cost and enabling domain-specific training, while future work could address relighting, transparency, and multi-pose scenarios.

Abstract

Generating synthetic images is a useful method for cheaply obtaining labeled data for training computer vision models. However, obtaining accurate 3D models of relevant objects is necessary, and the resulting images often have a gap in realism due to challenges in simulating lighting effects and camera artifacts. We propose using the novel view synthesis method called Gaussian Splatting to address these challenges. We have developed a synthetic data pipeline for generating high-quality context-aware instance segmentation training data for specific objects. This process is fully automated, requiring only a video of the target object. We train a Gaussian Splatting model of the target object and automatically extract the object from the video. Leveraging Gaussian Splatting, we then render the object on a random background image, and monocular depth estimation is employed to place the object in a believable pose. We introduce a novel dataset to validate our approach and show superior performance over other data generation approaches, such as Cut-and-Paste and Diffusion model-based generation.

Paper Structure

This paper contains 14 sections, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Our approach extracts foreground objects from trained Gaussian Splatting models and places them in plausible positions in background images to create high-quality synthetic images for training instance segmentation models.
  • Figure 2: An overview of our method for easily creating realistic synthetic data. First, a Gaussian Splatting model is trained on a simple input video. The model representing the foreground is extracted. Second, an arbitrary background image is taken, and the depth is detected to find feasible placement positions. The Gaussian Splatting model is used to render the foreground object in a plausible pose.
  • Figure 3: A subsequent plane filter, statistical filter, and cluster filter are used to extract the plant object from the point cloud representation of the trained Gaussian Splatting model. Red illustrates points that are selected for removal.
  • Figure 4: Illustration of possible object placements in the background images computed by our approach. For each image, we show 1000 possible placement positions, indicated by colored dots. A different color indicates a different plane.
  • Figure 5: The red bottle is rendered on the floor in the background (right). The depth map (left) computed by Depth Anything is used to realistically occlude the object behind the cable.
  • ...and 9 more figures