Table of Contents
Fetching ...

Efficient Parallel Reinforcement Learning Framework using the Reactor Model

Jacky Kwok, Marten Lohstroh, Edward A. Lee

TL;DR

This work tackles the bottlenecks of parallel reinforcement learning on single-node multi-core hardware by introducing Lingua Franca (LF) and the reactor model, which enable deterministic, fine-grained concurrency with fixed communication patterns. LF compiles to C and Python with a No-GIL runtime, enabling true parallelism and automatic generation of dataflow graphs for RL tasks, while reducing synchronization and IO overhead relative to the Ray framework. Empirical results show LF delivering 1.21x and 11.62x higher Gym and Atari simulation throughput, a 31.2% faster training time for synchronized parallel Q-learning, and a 5.12x speedup in multi-agent RL inference, demonstrating the practical impact of deterministic reactors and lock-free scheduling on RL workloads. The paper also discusses multithreading advantages over multiprocessing and outlines a path toward federated execution and embedded deployments, signaling LF’s potential to reshape efficient RL deployment across diverse platforms.

Abstract

Parallel Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources, allowing for faster generation of samples, estimation of values, and policy improvement. These computational paradigms require a seamless integration of training, serving, and simulation workloads. Existing frameworks, such as Ray, are not managing this orchestration efficiently, especially in RL tasks that demand intensive input/output and synchronization between actors on a single node. In this study, we have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern. This allows the scheduler to eliminate work needed for synchronization, such as acquiring and releasing locks for each actor or sending and processing coordination-related messages. Our framework, Lingua Franca (LF), a coordination language based on the reactor model, also supports true parallelism in Python and provides a unified interface that allows users to automatically generate dataflow graphs for RL tasks. In comparison to Ray on a single-node multi-core compute platform, LF achieves 1.21x and 11.62x higher simulation throughput in OpenAI Gym and Atari environments, reduces the average training time of synchronized parallel Q-learning by 31.2%, and accelerates multi-agent RL inference by 5.12x.

Efficient Parallel Reinforcement Learning Framework using the Reactor Model

TL;DR

This work tackles the bottlenecks of parallel reinforcement learning on single-node multi-core hardware by introducing Lingua Franca (LF) and the reactor model, which enable deterministic, fine-grained concurrency with fixed communication patterns. LF compiles to C and Python with a No-GIL runtime, enabling true parallelism and automatic generation of dataflow graphs for RL tasks, while reducing synchronization and IO overhead relative to the Ray framework. Empirical results show LF delivering 1.21x and 11.62x higher Gym and Atari simulation throughput, a 31.2% faster training time for synchronized parallel Q-learning, and a 5.12x speedup in multi-agent RL inference, demonstrating the practical impact of deterministic reactors and lock-free scheduling on RL workloads. The paper also discusses multithreading advantages over multiprocessing and outlines a path toward federated execution and embedded deployments, signaling LF’s potential to reshape efficient RL deployment across diverse platforms.

Abstract

Parallel Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources, allowing for faster generation of samples, estimation of values, and policy improvement. These computational paradigms require a seamless integration of training, serving, and simulation workloads. Existing frameworks, such as Ray, are not managing this orchestration efficiently, especially in RL tasks that demand intensive input/output and synchronization between actors on a single node. In this study, we have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern. This allows the scheduler to eliminate work needed for synchronization, such as acquiring and releasing locks for each actor or sending and processing coordination-related messages. Our framework, Lingua Franca (LF), a coordination language based on the reactor model, also supports true parallelism in Python and provides a unified interface that allows users to automatically generate dataflow graphs for RL tasks. In comparison to Ray on a single-node multi-core compute platform, LF achieves 1.21x and 11.62x higher simulation throughput in OpenAI Gym and Atari environments, reduces the average training time of synchronized parallel Q-learning by 31.2%, and accelerates multi-agent RL inference by 5.12x.
Paper Structure (24 sections, 11 figures)

This paper contains 24 sections, 11 figures.

Figures (11)

  • Figure 1: LF: compilation process
  • Figure 2: Generated Dataflow Graph for parallel RL tasks
  • Figure 3: Scheduling mechanism in the LF runtime menard2023high
  • Figure 4: Mean Overhead of Broadcast and Gather 10MB Object with Different Number of Actors using Ray and LF.
  • Figure 5: Mean Overhead of Broadcast and Gather on 16 actors with Different Object Sizes using Ray and LF.
  • ...and 6 more figures