How to Choose a Reinforcement-Learning Algorithm

Fabian Bongratz; Vladimir Golkov; Lukas Mautner; Luca Della Libera; Frederik Heetmeyer; Felix Czaja; Julian Rodemann; Daniel Cremers

How to Choose a Reinforcement-Learning Algorithm

Fabian Bongratz, Vladimir Golkov, Lukas Mautner, Luca Della Libera, Frederik Heetmeyer, Felix Czaja, Julian Rodemann, Daniel Cremers

TL;DR

The paper tackles the problem of choosing among the rapidly expanding set of deep reinforcement-learning algorithms by proposing a structured decision framework that links environment properties to algorithm properties. It introduces classifications such as model-free vs model-based, on-policy vs off-policy, distributional vs standard value learning, and value-based vs policy-based vs actor-critic, along with guidance on action-distribution families and neural-network parameterizations like $Q_ heta$ for value learning and $ abla_ heta J( heta)$ for policy optimization. Key contributions include tabular-style guidelines and decision tables that map situations to method properties (e.g., $ ext{on-policy}$ vs $ ext{off-policy}$, $ ext{distributional}$ vs $ ext{non-distributional}$) and provide concrete recommendations for action-distribution choices, architectures, and training stability. The interactive online version further enhances practical impact by enabling practitioners to tailor recommendations to their task, though the authors emphasize that there is no universal winner and experimentation remains essential for robust performance across diverse environments.

Abstract

The field of reinforcement learning offers a large variety of concepts and methods to tackle sequential decision-making problems. This variety has become so large that choosing an algorithm for a task at hand can be challenging. In this work, we streamline the process of choosing reinforcement-learning algorithms and action-distribution families. We provide a structured overview of existing methods and their properties, as well as guidelines for when to choose which methods. An interactive version of these guidelines is available online at https://rl-picker.github.io/.

How to Choose a Reinforcement-Learning Algorithm

TL;DR

for value learning and

for policy optimization. Key contributions include tabular-style guidelines and decision tables that map situations to method properties (e.g.,

) and provide concrete recommendations for action-distribution choices, architectures, and training stability. The interactive online version further enhances practical impact by enabling practitioners to tailor recommendations to their task, though the authors emphasize that there is no universal winner and experimentation remains essential for robust performance across diverse environments.

Abstract

Paper Structure (32 sections, 12 tables)

This paper contains 32 sections, 12 tables.

Introduction
RL algorithms
Model-free vs. model-based reinforcement learning
Hierarchical RL
Imitation learning
Distributed algorithms
Distributional algorithms
On-policy vs. off-policy learning
Target policy, behavior policy, test-time policy
Stochastic vs. deterministic target policy
Definition of on-policy and off-policy learning
Value-based vs. policy-based vs. actor-critic
Value-function learning
Entropy regularization
Action-distribution families
...and 17 more sections

How to Choose a Reinforcement-Learning Algorithm

TL;DR

Abstract

How to Choose a Reinforcement-Learning Algorithm

Authors

TL;DR

Abstract

Table of Contents