A Brief Survey of Deep Reinforcement Learning
Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, Anil Anthony Bharath
TL;DR
Deep reinforcement learning enables end-to-end learning for control and perception by combining RL with deep neural networks. The paper surveys core DRL paradigms, contrasting value-based and policy-based methods, and detailing key algorithms such as the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor-critic (A3C). It discusses how deep representations address the curse of dimensionality, explores planning vs model-based methods, and surveys current challenges including exploration, memory, transfer, and multi-agent settings. The work highlights benchmarks like Atari ALE and MuJoCo, and argues for integrating DRL with other AI techniques to achieve more data-efficient, generalizable, and capable autonomous agents.
Abstract
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep $Q$-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
