Hyper: Hyperparameter Robust Efficient Exploration in Reinforcement Learning
Yiran Wang, Chenshu Liu, Yunfan Li, Sanae Amani, Bolei Zhou, Lin F. Yang
TL;DR
Hyper addresses the hyperparameter sensitivity of curiosity-driven exploration in reinforcement learning by introducing a repositioning-based framework that decouples exploration from exploitation and regularizes exploration visitation. The algorithm, including a provably efficient Linear-UCB-Hyper variant, provides theoretical efficiency under linear MDP assumptions and demonstrates strong empirical robustness across diverse tasks, maintaining good performance across wide ranges of the curiosity coefficient $\beta$. By isolating task learning from exploration and truncating the repositioning length with a bounded geometric distribution, Hyper mitigates instability due to large intrinsic rewards and distribution shift. Empirically, Hyper matches or surpasses TD3, Curiosity, and Decouple baselines in both exploration-heavy and sparse-reward tasks, with notably reduced sensitivity to hyperparameters and improved stability, highlighting its practical impact for robust exploration in RL.
Abstract
The exploration \& exploitation dilemma poses significant challenges in reinforcement learning (RL). Recently, curiosity-based exploration methods achieved great success in tackling hard-exploration problems. However, they necessitate extensive hyperparameter tuning on different environments, which heavily limits the applicability and accessibility of this line of methods. In this paper, we characterize this problem via analysis of the agent behavior, concluding the fundamental difficulty of choosing a proper hyperparameter. We then identify the difficulty and the instability of the optimization when the agent learns with curiosity. We propose our method, hyperparameter robust exploration (\textbf{Hyper}), which extensively mitigates the problem by effectively regularizing the visitation of the exploration and decoupling the exploitation to ensure stable training. We theoretically justify that \textbf{Hyper} is provably efficient under function approximation setting and empirically demonstrate its appealing performance and robustness in various environments.
