Going into Orbit: Massively Parallelizing Episodic Reinforcement Learning

Jan Oberst; Johann Bonneau

Going into Orbit: Massively Parallelizing Episodic Reinforcement Learning

Jan Oberst, Johann Bonneau

TL;DR

The paper addresses the challenge of efficiently training robotic reinforcement learning agents in realistic simulations by introducing NVIDIA's Orbit as a GPU-accelerated framework that integrates Isaac Sim with multiple RL libraries. It presents a detailed box-pushing benchmark implemented in Orbit, including both a step-based and a black-box reinforcement learning (BBRL) pathway using movement primitives and Probabilistic Movement Primitives (ProMP). Through experiments comparing Orbit to Fancy Gym (MuJoCo) and by tuning Orbit for high parallelism (up to 4096 environments), the work demonstrates substantial gains in sample throughput and training speed, while also highlighting reproducibility and simulator-variance issues across platforms. The findings underscore Orbit's potential to accelerate robotics RL research and benchmarking, while pointing to future work on broader benchmarks, randomization strategies, and cross-simulator comparisons to better characterize performance and generalization.

Abstract

The possibilities of robot control have multiplied across various domains through the application of deep reinforcement learning. To overcome safety and sampling efficiency issues, deep reinforcement learning models can be trained in a simulation environment, allowing for faster iteration cycles. This can be enhanced further by parallelizing the training process using GPUs. NVIDIA's open-source robot learning framework Orbit leverages this potential by wrapping tensor-based reinforcement learning libraries for high parallelism and building upon Isaac Sim for its simulations. We contribute a detailed description of the implementation of a benchmark reinforcement learning task, namely box pushing, using Orbit. Additionally, we benchmark the performance of our implementation in comparison to a CPU-based implementation and report the performance metrics. Finally, we tune the hyper parameters of our implementation and show that we can generate significantly more samples in the same amount of time by using Orbit.

Going into Orbit: Massively Parallelizing Episodic Reinforcement Learning

TL;DR

Abstract

Paper Structure (17 sections, 5 equations, 8 figures, 3 tables)

This paper contains 17 sections, 5 equations, 8 figures, 3 tables.

Introduction
Problem Statement
Structure of Orbit
Benchmark Environment Setup
Implementation
Step-Based Environment
Creating a Base Environment
Creating a RL Environment
Black-Box Environment
Black-Box Reinforcement Learning
MP-based environment
Results & Evaluation
Benchmark: Orbit vs. Fancy Gym
Tuning Orbit
Conclusion & Outlook
...and 2 more sections

Figures (8)

Figure 1: Relationship of the components for the experiments in this study. Green: The simulation framework unifies the environments, learning libraries, and simulator to create a robot learning environment. Gray: Specific components used in this study.
Figure 2: Box pushing environment: The agent is supposed to push the box into a designated goal pose by moving the table-mounted Franka robot arm.
Figure 3: Overview of the main file structure of the box pushing environment in Orbit (blue: folders, gray: files, Note: The MDP folder and the contained files are not represented as they are not relevant for the configuration of the environment).
Figure 4: Figure taken from otto2023deep. Overview of the proposed Black-Box Reinforcement Learning (BBRL) framework. The normal distributed policy predicts, given the context ${c}$, the parameters of a movement primitive that translates to a desired trajectory $\tau^d$. A trajectory tracking controller $f$ generates low-level actions $a_t$ given the current state $s_t$ and the desired state $s_t^d$ from the trajectory $\tau^d$.
Figure 5: Overview of the interaction of step-based and black-box reinforcement learning (BBRL) agents with their environment. Right: Step-based reinforcement learning agent performing a step. Left: Black box reinforcement learning agent performing a step.
...and 3 more figures

Going into Orbit: Massively Parallelizing Episodic Reinforcement Learning

TL;DR

Abstract

Going into Orbit: Massively Parallelizing Episodic Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)