Table of Contents
Fetching ...

PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DoF Object Pose Dataset Generation

Lukas Meyer, Floris Erich, Yusuke Yoshiyasu, Marc Stamminger, Noriaki Ando, Yukiyasu Domae

TL;DR

PEGASUS addresses the realism gap in synthetic 6DoF pose data by combining environment and object reconstructions with 3D Gaussian Splatting and physics-based placement to generate diverse static and dynamic scenes. It renders RGB, depth, semantic masks, and precise 6DoF poses in BOP format, enabling effective training and transfer of pose-estimation networks like DOPE to real imagery. The authors introduce the Ramen dataset and PEGASET to demonstrate scalability with scanned environments and objects, and show that networks trained on PEGASUS data can perform real-to-synthetic transfer in robotic grasp tasks with UR5. Overall, PEGASUS provides a modular framework for domain-specific dataset generation that can be extended with additional environments, diffusion-based augmentations, and LIDAR-informed 3DGS to further close the reality gap.

Abstract

We introduce Physically Enhanced Gaussian Splatting Simulation System (PEGASUS) for 6DOF object pose dataset generation, a versatile dataset generator based on 3D Gaussian Splatting. Environment and object representations can be easily obtained using commodity cameras to reconstruct with Gaussian Splatting. <i>PEGASUS</i> allows the composition of new scenes by merging the respective underlying Gaussian Splatting point cloud of an environment with one or multiple objects. Leveraging a physics engine enables the simulation of natural object placement within a scene through interaction between meshes extracted for the objects and the environment. Consequently, an extensive amount of new scenes - static or dynamic - can be created by combining different environments and objects. By rendering scenes from various perspectives, diverse data points such as RGB images, depth maps, semantic masks, and 6DoF object poses can be extracted. Our study demonstrates that training on data generated by PEGASUS enables pose estimation networks to successfully transfer from synthetic data to real-world data. Moreover, we introduce the Ramen dataset, comprising 30 Japanese cup noodle items. This dataset includes spherical scans that captures images from both object hemisphere and the Gaussian Splatting reconstruction, making them compatible with PEGASUS.

PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DoF Object Pose Dataset Generation

TL;DR

PEGASUS addresses the realism gap in synthetic 6DoF pose data by combining environment and object reconstructions with 3D Gaussian Splatting and physics-based placement to generate diverse static and dynamic scenes. It renders RGB, depth, semantic masks, and precise 6DoF poses in BOP format, enabling effective training and transfer of pose-estimation networks like DOPE to real imagery. The authors introduce the Ramen dataset and PEGASET to demonstrate scalability with scanned environments and objects, and show that networks trained on PEGASUS data can perform real-to-synthetic transfer in robotic grasp tasks with UR5. Overall, PEGASUS provides a modular framework for domain-specific dataset generation that can be extended with additional environments, diffusion-based augmentations, and LIDAR-informed 3DGS to further close the reality gap.

Abstract

We introduce Physically Enhanced Gaussian Splatting Simulation System (PEGASUS) for 6DOF object pose dataset generation, a versatile dataset generator based on 3D Gaussian Splatting. Environment and object representations can be easily obtained using commodity cameras to reconstruct with Gaussian Splatting. <i>PEGASUS</i> allows the composition of new scenes by merging the respective underlying Gaussian Splatting point cloud of an environment with one or multiple objects. Leveraging a physics engine enables the simulation of natural object placement within a scene through interaction between meshes extracted for the objects and the environment. Consequently, an extensive amount of new scenes - static or dynamic - can be created by combining different environments and objects. By rendering scenes from various perspectives, diverse data points such as RGB images, depth maps, semantic masks, and 6DoF object poses can be extracted. Our study demonstrates that training on data generated by PEGASUS enables pose estimation networks to successfully transfer from synthetic data to real-world data. Moreover, we introduce the Ramen dataset, comprising 30 Japanese cup noodle items. This dataset includes spherical scans that captures images from both object hemisphere and the Gaussian Splatting reconstruction, making them compatible with PEGASUS.
Paper Structure (15 sections, 3 equations, 4 figures)

This paper contains 15 sections, 3 equations, 4 figures.

Figures (4)

  • Figure 1: Representative scenes generated by PEGASUS. By separately reconstructing objects and environment with Gaussian Splatting and connecting them to a physics engine a vast variety of scenes can be generated utilizing novel view synthesis. At each snapshot, multiple data points such as RGB images, segmentation masks, depth maps, 2D/3D bounding boxes, and object poses can be extracted.
  • Figure 2: Pipeline of the PEGASUS dataset generator. The 3DGS base environment (see Section \ref{['subsec:base_environment']}) comprises both the 3DGS reconstruction and a mesh reconstructed from its point cloud. The 'Object' includes the 3DGS representation of the object (discussed as the photometric entity in Section \ref{['subsec:photometric_entity']}) and a low-poly mesh of the same object (covered as the geometric entity in Section \ref{['subsec:geometric_entity']}). By utilizing the mesh of the base environment and the object entity, an arbitrary number of objects can be simulated in the physics engine (refer to Section \ref{['subsec:physical_engine']}), facilitating realistic and random placement of the objects within the scene. When the trajectories of the objects are applied to the photometric instances of the environment and the object, we are capable of rendering dynamic and static scenes from various viewpoints and time steps. These data are then saved in the BOP data format BOP.
  • Figure 3: Gallery of data generated by PEGASUS. It shows scenes generated with 9 different base environments and an arbitrary combination of the 30 elements from the Ramen dataset and from the 21 YCB objects YCB_Objects from the YCB-V dataset.
  • Figure 4: 30 Objects recorded for our Ramen dataset of common Japanese cup noodles available at most supermarkets.