Is Exploration All You Need? Effective Exploration Characteristics for Transfer in Reinforcement Learning
Jonathan C. Balloch, Rishav Bhagat, Geigh Zollicoffer, Ruoran Jia, Julia Kim, Mark O. Riedl
TL;DR
This work tackles the problem of understanding which exploration characteristics enable efficient online transfer in deep reinforcement learning under non-stationary novelties. It conducts a large-scale empirical study, evaluating eleven exploration algorithms on diversified two-environment transfer problems, and introduces a taxonomy based on exploration principles (stochasticity, explicit diversity, separate objective) and temporal locality, plus algorithmic instantiation. Using metrics such as Convergence efficiency, Adaptive efficiency, Final adaptive performance, and Tr-AUC, it finds that explicit diversity and stochasticity are the most consistently beneficial for transfer across novelties and environments, while the benefits of time-dependent exploration vary by task and novelty type. The results provide practical guidance for selecting and combining exploration characteristics to improve online task transfer in real-world, non-stationary RL settings and suggest directions for dynamic, transfer-aware exploration design.
Abstract
In deep reinforcement learning (RL) research, there has been a concerted effort to design more efficient and productive exploration methods while solving sparse-reward problems. These exploration methods often share common principles (e.g., improving diversity) and implementation details (e.g., intrinsic reward). Prior work found that non-stationary Markov decision processes (MDPs) require exploration to efficiently adapt to changes in the environment with online transfer learning. However, the relationship between specific exploration characteristics and effective transfer learning in deep RL has not been characterized. In this work, we seek to understand the relationships between salient exploration characteristics and improved performance and efficiency in transfer learning. We test eleven popular exploration algorithms on a variety of transfer types -- or ``novelties'' -- to identify the characteristics that positively affect online transfer learning. Our analysis shows that some characteristics correlate with improved performance and efficiency across a wide range of transfer tasks, while others only improve transfer performance with respect to specific environment changes. From our analysis, make recommendations about which exploration algorithm characteristics are best suited to specific transfer situations.
