Table of Contents
Fetching ...

An Introduction to Deep Reinforcement Learning

Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, Joelle Pineau

TL;DR

This paper surveys deep reinforcement learning, framing it as the combination of reinforcement learning with deep neural networks to solve high-dimensional sequential decision problems. It categorizes approaches into value-based (e.g., DQN and its variants), policy-gradient (including actor-critic and natural/trust-region methods), and model-based methods, and discusses how each scales with neural function approximators. It emphasizes generalization as a core challenge, detailing feature selection, auxiliary tasks, objective shaping, and hierarchical learning as mechanisms to balance bias and overfitting across offline and online settings. The survey also covers benchmarking practices, exploration strategies, and the role of non-MDP settings (POMDPs, transfer/continual learning, demonstrations, and multi-agent systems) in broadening deep RL applicability, with insights into safety, reliability, and societal impact. Overall, it highlights how integrating model-based and model-free approaches, improving sample efficiency, and pursuing meta-learning and curriculum strategies are key directions for advancing deep RL toward robust, real-world deployment, while noting the need for careful evaluation and ethical considerations.

Abstract

Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision-making tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.

An Introduction to Deep Reinforcement Learning

TL;DR

This paper surveys deep reinforcement learning, framing it as the combination of reinforcement learning with deep neural networks to solve high-dimensional sequential decision problems. It categorizes approaches into value-based (e.g., DQN and its variants), policy-gradient (including actor-critic and natural/trust-region methods), and model-based methods, and discusses how each scales with neural function approximators. It emphasizes generalization as a core challenge, detailing feature selection, auxiliary tasks, objective shaping, and hierarchical learning as mechanisms to balance bias and overfitting across offline and online settings. The survey also covers benchmarking practices, exploration strategies, and the role of non-MDP settings (POMDPs, transfer/continual learning, demonstrations, and multi-agent systems) in broadening deep RL applicability, with insights into safety, reliability, and societal impact. Overall, it highlights how integrating model-based and model-free approaches, improving sample efficiency, and pursuing meta-learning and curriculum strategies are key directions for advancing deep RL toward robust, real-world deployment, while noting the need for careful evaluation and ethical considerations.

Abstract

Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision-making tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.

Paper Structure

This paper contains 88 sections, 44 equations, 23 figures, 1 table.

Figures (23)

  • Figure 1: Illustration of overfitting and underfitting for a simple 1D regression task in supervised learning (based on one example from the library scikit-learn pedregosa2011scikit). In this illustration, the data points $(x,y)$ are noisy samples from a true function represented in green. In the left figure, the degree 1 approximation is underfitting, which means that it is not a good model, even for the training samples; on the right, the degree 10 approximation is a very good model for the training samples but is overly complex and fails to provide a good generalization.
  • Figure 2: Example of a neural network with one hidden layer.
  • Figure 3: Illustration of a convolutional layer with one input feature map that is convolved by different filters to yield the output feature maps. The parameters that are learned for this type of layer are those of the filters. For illustration purposes, some results are displayed for one of the output feature maps with a given filter (in practice, that operation is followed by a non-linear activation function).
  • Figure 4: Illustration of a simple recurrent neural network. The layer denoted by "h" may represent any non linear function that takes two inputs and provides two outputs. On the left is the simplified view of a recurrent neural network that is applied recursively to $(x_{t},y_{t})$ for increasing values of $t$ and where the blue line presents a delay of one time step. On the right, the neural network is unfolded with the implicit requirement of presenting all inputs and outputs simultaneously.
  • Figure 5: Agent-environment interaction in RL.
  • ...and 18 more figures

Theorems & Definitions (4)

  • Definition 3.1
  • Definition 3.2
  • Definition 10.1
  • Definition 10.2