Table of Contents
Fetching ...

A Method for Evaluating Hyperparameter Sensitivity in Reinforcement Learning

Jacob Adkins, Michael Bowling, Adam White

TL;DR

The paper addresses the problem that reinforcement learning performance is highly sensitive to hyperparameters and environment-specific tuning, and current evaluation often ignores this. It proposes two empirical metrics— hyperparameter sensitivity $\Phi(\omega)$ and effective hyperparameter dimensionality $d(\omega)$—together with percentile-normalized performance $\Gamma(\omega,e,h)$ to quantify cross-environment robustness and tuning requirements. By applying these metrics to PPO with various normalization tricks, the study demonstrates that some improvements in performance come at the cost of greater hyperparameter sensitivity and higher effective dimensionality. The findings advocate for holistic evaluation beyond raw performance, guiding the design of algorithms that are robust to hyperparameter tuning and potentially more practical for real-world deployment.

Abstract

The performance of modern reinforcement learning algorithms critically relies on tuning ever-increasing numbers of hyperparameters. Often, small changes in a hyperparameter can lead to drastic changes in performance, and different environments require very different hyperparameter settings to achieve state-of-the-art performance reported in the literature. We currently lack a scalable and widely accepted approach to characterizing these complex interactions. This work proposes a new empirical methodology for studying, comparing, and quantifying the sensitivity of an algorithm's performance to hyperparameter tuning for a given set of environments. We then demonstrate the utility of this methodology by assessing the hyperparameter sensitivity of several commonly used normalization variants of PPO. The results suggest that several algorithmic performance improvements may, in fact, be a result of an increased reliance on hyperparameter tuning.

A Method for Evaluating Hyperparameter Sensitivity in Reinforcement Learning

TL;DR

The paper addresses the problem that reinforcement learning performance is highly sensitive to hyperparameters and environment-specific tuning, and current evaluation often ignores this. It proposes two empirical metrics— hyperparameter sensitivity and effective hyperparameter dimensionality —together with percentile-normalized performance to quantify cross-environment robustness and tuning requirements. By applying these metrics to PPO with various normalization tricks, the study demonstrates that some improvements in performance come at the cost of greater hyperparameter sensitivity and higher effective dimensionality. The findings advocate for holistic evaluation beyond raw performance, guiding the design of algorithms that are robust to hyperparameter tuning and potentially more practical for real-world deployment.

Abstract

The performance of modern reinforcement learning algorithms critically relies on tuning ever-increasing numbers of hyperparameters. Often, small changes in a hyperparameter can lead to drastic changes in performance, and different environments require very different hyperparameter settings to achieve state-of-the-art performance reported in the literature. We currently lack a scalable and widely accepted approach to characterizing these complex interactions. This work proposes a new empirical methodology for studying, comparing, and quantifying the sensitivity of an algorithm's performance to hyperparameter tuning for a given set of environments. We then demonstrate the utility of this methodology by assessing the hyperparameter sensitivity of several commonly used normalization variants of PPO. The results suggest that several algorithmic performance improvements may, in fact, be a result of an increased reliance on hyperparameter tuning.

Paper Structure

This paper contains 17 sections, 4 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: A count of hyperparameters for different reinforcement learning algorithms proposed over the last decade. We include value-based, policy-gradient, and model-based methods. The counts do not include hyperparameters controlling the network architectures, such as number of layers, activation functions, etc. See Appendix \ref{['proliferation']} for details on how hyperparameters were counted.
  • Figure 2: Left: The distributions of performance (AUC) over 625 hyperparameter settings for the PPO algorithm in Swimmer and Halfcheetah Brax environments. Right: The same distributions after applying score normalization. Each data point is the mean AUC across runs. Each run consisted of 3M steps of agent-environment interaction.
  • Figure 3: The distributions of environment normalized scores for 625 hyperparameter settings of the PPO algorithm in the Swimmer and Halfcheetah environments. The red stars indicate the normalized environment scores of a hyperparameter setting, which does well in Halfcheetah but poorly in Swimmer. The blue stars indicate the normalized scores of the hyperparameter setting, which maximizes the mean of the normalized environment scores across both environments.
  • Figure 4: The performance-sensitivity plane for algorithmic evaluation. The center point indicates the hyperparameter sensitivity and performance of a reference point algorithm. The x-axis is the hyperparameter sensitivity metric as defined in equation \ref{['eq:sensitivity']}. The y-axis is the per-environment tuned score (first term in equation \ref{['eq:sensitivity']}). The diagonal line is the identity line shifted to intersect the reference point algorithm. The plane is then divided into 5 shaded regions that represent spaces of algorithms of varying qualities relative to the baseline.
  • Figure 5: Performance-sensitivity plane with unnormalized PPO as the center reference point. Variants of PPO plotted. The x-axis indicates hyperparameter sensitivity as defined in equation \ref{['eq:sensitivity']}. The y-axis represents the per-environment tuned score (first term in the sensitivity calculation of equation \ref{['eq:sensitivity']}). Hyperparameter sensitivity and per-environment tuned score metrics were computed from a 200 run sweep of 625 hyperparameter settings across 5 Brax Mujoco environments (Ant, Halfcheetah, Hopper, Swimmer, and Walker2d). Error bars show the endpoints of 10,000 sample 95% bootstrap confidence intervals around both the performance and hyperparameter sensitivity metrics (two dimensions).
  • ...and 3 more figures