Deep Reinforcement Learning in Parameterized Action Space
Matthew Hausknecht, Peter Stone
TL;DR
This work extends the Deep Deterministic Policy Gradients (DDPG) framework to parameterized action spaces by introducing a method for bounding action-space gradients, enabling stable learning in bounded continuous actions. Using the RoboCup 2D Half Field Offense domain, the authors train an actor-critic network to simultaneously select discrete action types (Dash, Turn, Tackle, Kick) and their continuous parameters, achieving goal-directed behaviors from scratch. Empirical results show that inverting-gradient gradient-bounding strategies yield robust learning, with multiple agents learning to approach the ball, kick toward the goal, and score—several surpassing a strong hand-coded Helios champion and outperforming a SARSA baseline. The findings demonstrate the viability and practicality of deep reinforcement learning in parameterized action spaces and provide a generally applicable technique for bounded continuous actions beyond RoboCup.
Abstract
Recent work has shown that deep neural networks are capable of approximating both value functions and policies in reinforcement learning domains featuring continuous state and action spaces. However, to the best of our knowledge no previous work has succeeded at using deep neural networks in structured (parameterized) continuous action spaces. To fill this gap, this paper focuses on learning within the domain of simulated RoboCup soccer, which features a small set of discrete action types, each of which is parameterized with continuous variables. The best learned agent can score goals more reliably than the 2012 RoboCup champion agent. As such, this paper represents a successful extension of deep reinforcement learning to the class of parameterized action space MDPs.
