On the consistency of hyper-parameter selection in value-based deep reinforcement learning
Johan Obando-Ceron, João G. M. Araújo, Aaron Courville, Pablo Samuel Castro
TL;DR
The paper tackles the problem of hyper-parameter selection reliability in value-based deep reinforcement learning by introducing the THC score, a ranking-based consistency metric across training regimes. It executes a large-scale empirical study with 12 hyper-parameters, two agents (DER and DrQ($\epsilon$)), 26 Atari environments, and two data regimes ($100k$ and $40M$ frames), totaling about $108k$ training runs, complemented by a web-based appendix. The findings show limited transferability across data regimes and environments, while some hyper-parameter settings transfer reasonably across agents; epsilon and update-period-like hyper-parameters are particularly tuning-sensitive. The work offers a practical framework for robust hyper-parameter selection and interpretable exploration of transferability, highlighting the need for environment-aware or dynamic tuning in real-world DRL deployments.
Abstract
Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed technique. Despite their crucial impact on performance, hyper-parameter choices are frequently overshadowed by algorithmic advancements. This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters. Our findings not only help establish which hyper-parameters are most critical to tune, but also help clarify which tunings remain consistent across different training regimes.
