A Method for Evaluating Hyperparameter Sensitivity in Reinforcement Learning
Jacob Adkins, Michael Bowling, Adam White
TL;DR
The paper addresses the problem that reinforcement learning performance is highly sensitive to hyperparameters and environment-specific tuning, and current evaluation often ignores this. It proposes two empirical metrics— hyperparameter sensitivity $\Phi(\omega)$ and effective hyperparameter dimensionality $d(\omega)$—together with percentile-normalized performance $\Gamma(\omega,e,h)$ to quantify cross-environment robustness and tuning requirements. By applying these metrics to PPO with various normalization tricks, the study demonstrates that some improvements in performance come at the cost of greater hyperparameter sensitivity and higher effective dimensionality. The findings advocate for holistic evaluation beyond raw performance, guiding the design of algorithms that are robust to hyperparameter tuning and potentially more practical for real-world deployment.
Abstract
The performance of modern reinforcement learning algorithms critically relies on tuning ever-increasing numbers of hyperparameters. Often, small changes in a hyperparameter can lead to drastic changes in performance, and different environments require very different hyperparameter settings to achieve state-of-the-art performance reported in the literature. We currently lack a scalable and widely accepted approach to characterizing these complex interactions. This work proposes a new empirical methodology for studying, comparing, and quantifying the sensitivity of an algorithm's performance to hyperparameter tuning for a given set of environments. We then demonstrate the utility of this methodology by assessing the hyperparameter sensitivity of several commonly used normalization variants of PPO. The results suggest that several algorithmic performance improvements may, in fact, be a result of an increased reliance on hyperparameter tuning.
