Synchronous vs Asynchronous Reinforcement Learning in a Real World Robot
Ali Parsaee, Fahim Shahriar, Chuxin He, Ruiqing Tan
TL;DR
This work tackles the challenge of real-time control in physical robots by comparing synchronous and asynchronous reinforcement learning using Soft Actor-Critic on a Franka Emika Panda. Employing the ReLoD system and the Franka-Env, the study demonstrates that asynchronous RL achieves faster learning and higher returns, driven primarily by reduced interaction time and higher sample throughput rather than the sheer number of gradient updates. The results advocate for asynchronous architectures in real-time robotics and provide guidance on action cycle time and data flow in RL systems. These findings have practical implications for deploying RL-based controllers in dynamic real-world settings and inform future research on multi-process RL for robotics.
Abstract
In recent times, reinforcement learning (RL) with physical robots has attracted the attention of a wide range of researchers. However, state-of-the-art RL algorithms do not consider that physical environments do not wait for the RL agent to make decisions or updates. RL agents learn by periodically conducting computationally expensive gradient updates. When decision-making and gradient update tasks are carried out sequentially by the RL agent in a physical robot, it significantly increases the agent's response time. In a rapidly changing environment, this increased response time may be detrimental to the performance of the learning agent. Asynchronous RL methods, which separate the computation of decision-making and gradient updates, are a potential solution to this problem. However, only a few comparisons between asynchronous and synchronous RL have been made with physical robots. For this reason, the exact performance benefits of using asynchronous RL methods over synchronous RL methods are still unclear. In this study, we provide a performance comparison between asynchronous and synchronous RL using a physical robotic arm called Franka Emika Panda. Our experiments show that the agents learn faster and attain significantly more returns using asynchronous RL. Our experiments also demonstrate that the learning agent with a faster response time performs better than the agent with a slower response time, even if the agent with a slower response time performs a higher number of gradient updates.
