Thinking While Moving: Deep Reinforcement Learning with Concurrent Control
Ted Xiao, Eric Jang, Dmitry Kalashnikov, Sergey Levine, Julian Ibarz, Karol Hausman, Alexander Herzog
TL;DR
The paper addresses reinforcement learning in concurrent environments where action selection occurs during ongoing dynamics, formalizing a continuous-time Bellman framework and a latency-aware discretization. It shows that augmenting Q-learning with minimal concurrency information, particularly previous actions and action-selection latency, preserves contraction and convergence, and introduces Vector-to-go (VTG) as a robust representation. Through toy control tasks and large-scale robotic grasping experiments, including real-robot results, the approach achieves faster, smoother policies with comparable task success to blocking baselines. The work demonstrates practical gains in speed and motion quality for real-time robotic control and lays groundwork for future extensions to other RL methods and latency regimes.
Abstract
We study reinforcement learning in settings where sampling an action from the policy must be done concurrently with the time evolution of the controlled system, such as when a robot must decide on the next action while still performing the previous action. Much like a person or an animal, the robot must think and move at the same time, deciding on its next action before the previous one has completed. In order to develop an algorithmic framework for such concurrent control problems, we start with a continuous-time formulation of the Bellman equations, and then discretize them in a way that is aware of system delays. We instantiate this new class of approximate dynamic programming methods via a simple architectural extension to existing value-based deep reinforcement learning algorithms. We evaluate our methods on simulated benchmark tasks and a large-scale robotic grasping task where the robot must "think while moving".
