Table of Contents
Fetching ...

GRADE: Generating Realistic And Dynamic Environments for Robotics Research with Isaac Sim

Elia Bonetto, Chenghao Xu, Aamir Ahmad

TL;DR

This work presents a highly customizable framework built upon NVIDIA Isaac Sim for Generating Realistic and Dynamic Environments—GRADE, and introduces a novel experiment repetition approach that allows environmental and scenario variations of previous simulations within physics-enabled environments, enabling flexible and continuous testing, development, and data generation.

Abstract

Synthetic data and novel rendering techniques have greatly influenced computer vision research in tasks like target tracking and human pose estimation. However, robotics research has lagged behind in leveraging it due to the limitations of most simulation frameworks, including the lack of low-level software control and flexibility, Robot Operating System integration, realistic physics, or photorealism. This hindered progress in (visual-)perception research, e.g. in autonomous robotics, especially in dynamic environments. Visual Simultaneous Localization and Mapping (V-SLAM), for instance, has been mostly developed passively, in static environments, and evaluated on few pre-recorded dynamic datasets due to the difficulties of realistically simulating dynamic worlds and the huge sim-to-real gap. To address these challenges, we present GRADE (Generating Realistic and Dynamic Environments), a highly customizable framework built upon NVIDIA Isaac Sim. We leverage Isaac's rendering capabilities and low-level APIs to populate and control the simulation, collect ground-truth data, and test online and offline approaches. Importantly, we introduce a new way to precisely repeat a recorded experiment within a physically enabled simulation while allowing environmental and simulation changes. Next, we collect a synthetic dataset of richly annotated videos in dynamic environments with a flying drone. Using that, we train detection and segmentation models for humans, closing the syn-to-real gap. Finally, we benchmark state-of-the-art dynamic V-SLAM algorithms, revealing their short tracking times and low generalization capabilities. We also show for the first time that the top-performing deep learning models do not achieve the best SLAM performance. Code and data are provided as open-source at https://grade.is.tue.mpg.de.

GRADE: Generating Realistic And Dynamic Environments for Robotics Research with Isaac Sim

TL;DR

This work presents a highly customizable framework built upon NVIDIA Isaac Sim for Generating Realistic and Dynamic Environments—GRADE, and introduces a novel experiment repetition approach that allows environmental and scenario variations of previous simulations within physics-enabled environments, enabling flexible and continuous testing, development, and data generation.

Abstract

Synthetic data and novel rendering techniques have greatly influenced computer vision research in tasks like target tracking and human pose estimation. However, robotics research has lagged behind in leveraging it due to the limitations of most simulation frameworks, including the lack of low-level software control and flexibility, Robot Operating System integration, realistic physics, or photorealism. This hindered progress in (visual-)perception research, e.g. in autonomous robotics, especially in dynamic environments. Visual Simultaneous Localization and Mapping (V-SLAM), for instance, has been mostly developed passively, in static environments, and evaluated on few pre-recorded dynamic datasets due to the difficulties of realistically simulating dynamic worlds and the huge sim-to-real gap. To address these challenges, we present GRADE (Generating Realistic and Dynamic Environments), a highly customizable framework built upon NVIDIA Isaac Sim. We leverage Isaac's rendering capabilities and low-level APIs to populate and control the simulation, collect ground-truth data, and test online and offline approaches. Importantly, we introduce a new way to precisely repeat a recorded experiment within a physically enabled simulation while allowing environmental and simulation changes. Next, we collect a synthetic dataset of richly annotated videos in dynamic environments with a flying drone. Using that, we train detection and segmentation models for humans, closing the syn-to-real gap. Finally, we benchmark state-of-the-art dynamic V-SLAM algorithms, revealing their short tracking times and low generalization capabilities. We also show for the first time that the top-performing deep learning models do not achieve the best SLAM performance. Code and data are provided as open-source at https://grade.is.tue.mpg.de.
Paper Structure (26 sections, 7 figures, 11 tables, 1 algorithm)

This paper contains 26 sections, 7 figures, 11 tables, 1 algorithm.

Figures (7)

  • Figure 1: An example of the data generated using our simulation framework GRADE, assets from Cloth3D humanscloth3d and one of the environments from 3D-Front3d-front3d-future. Top row, left to right: Rendered RGB image, corresponding depth map, optical flow, and surface normals. Bottom row, left to right: 2D bounding boxes, semantic instances, semantic segmentation, and SMPL smpl shapes. Best viewed in color.
  • Figure 2: The RGB images are in the top row, with the associated instance segmentations (randomly colored) below. For the multi-robot UAV images, we highlight the other robots in the field of view with a red box. Best viewed in color.
  • Figure 3: Recap of the main components of the GRADE framework. With a blue background, we highlight the software developed within the scope of this work and give reference to the specific repository in the footnotes.
  • Figure 4: A schematic of the ROS-free system used in the Savanna simulation. In blue, we highlight our customizations.
  • Figure 5: A schematic of the main dataset generation pipeline. In blue, we highlight our customizations.
  • ...and 2 more figures