How to Choose a Reinforcement-Learning Algorithm
Fabian Bongratz, Vladimir Golkov, Lukas Mautner, Luca Della Libera, Frederik Heetmeyer, Felix Czaja, Julian Rodemann, Daniel Cremers
TL;DR
The paper tackles the problem of choosing among the rapidly expanding set of deep reinforcement-learning algorithms by proposing a structured decision framework that links environment properties to algorithm properties. It introduces classifications such as model-free vs model-based, on-policy vs off-policy, distributional vs standard value learning, and value-based vs policy-based vs actor-critic, along with guidance on action-distribution families and neural-network parameterizations like $Q_ heta$ for value learning and $ abla_ heta J( heta)$ for policy optimization. Key contributions include tabular-style guidelines and decision tables that map situations to method properties (e.g., $ ext{on-policy}$ vs $ ext{off-policy}$, $ ext{distributional}$ vs $ ext{non-distributional}$) and provide concrete recommendations for action-distribution choices, architectures, and training stability. The interactive online version further enhances practical impact by enabling practitioners to tailor recommendations to their task, though the authors emphasize that there is no universal winner and experimentation remains essential for robust performance across diverse environments.
Abstract
The field of reinforcement learning offers a large variety of concepts and methods to tackle sequential decision-making problems. This variety has become so large that choosing an algorithm for a task at hand can be challenging. In this work, we streamline the process of choosing reinforcement-learning algorithms and action-distribution families. We provide a structured overview of existing methods and their properties, as well as guidelines for when to choose which methods. An interactive version of these guidelines is available online at https://rl-picker.github.io/.
