Efficient Parallel Reinforcement Learning Framework using the Reactor Model
Jacky Kwok, Marten Lohstroh, Edward A. Lee
TL;DR
This work tackles the bottlenecks of parallel reinforcement learning on single-node multi-core hardware by introducing Lingua Franca (LF) and the reactor model, which enable deterministic, fine-grained concurrency with fixed communication patterns. LF compiles to C and Python with a No-GIL runtime, enabling true parallelism and automatic generation of dataflow graphs for RL tasks, while reducing synchronization and IO overhead relative to the Ray framework. Empirical results show LF delivering 1.21x and 11.62x higher Gym and Atari simulation throughput, a 31.2% faster training time for synchronized parallel Q-learning, and a 5.12x speedup in multi-agent RL inference, demonstrating the practical impact of deterministic reactors and lock-free scheduling on RL workloads. The paper also discusses multithreading advantages over multiprocessing and outlines a path toward federated execution and embedded deployments, signaling LF’s potential to reshape efficient RL deployment across diverse platforms.
Abstract
Parallel Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources, allowing for faster generation of samples, estimation of values, and policy improvement. These computational paradigms require a seamless integration of training, serving, and simulation workloads. Existing frameworks, such as Ray, are not managing this orchestration efficiently, especially in RL tasks that demand intensive input/output and synchronization between actors on a single node. In this study, we have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern. This allows the scheduler to eliminate work needed for synchronization, such as acquiring and releasing locks for each actor or sending and processing coordination-related messages. Our framework, Lingua Franca (LF), a coordination language based on the reactor model, also supports true parallelism in Python and provides a unified interface that allows users to automatically generate dataflow graphs for RL tasks. In comparison to Ray on a single-node multi-core compute platform, LF achieves 1.21x and 11.62x higher simulation throughput in OpenAI Gym and Atari environments, reduces the average training time of synchronized parallel Q-learning by 31.2%, and accelerates multi-agent RL inference by 5.12x.
