Adaptable Hindsight Experience Replay for Search-Based Learning
Alexandros Vazaios, Jannis Brugger, Cedric Derstroff, Kristian Kersting, Mira Mezini
TL;DR
Sparse reward signals hinder training in neural-guided MCTS. The authors propose Adaptable Hindsight Experience Replay (AHER), a unifying framework that parameterizes HER across four properties and integrates it with AlphaZero-like search. Across bit-flipping, point maze, and equation discovery, AHER demonstrates that customizing HER configurations yields improvements over pure reinforcement learning or supervised learning, with task-dependent optimal settings. The work highlights both the potential and limitations of HER in neural-guided search and points to future directions such as probabilistic transitions and curriculum-based relabeling.
Abstract
AlphaZero-like Monte Carlo Tree Search systems, originally introduced for two-player games, dynamically balance exploration and exploitation using neural network guidance. This combination makes them also suitable for classical search problems. However, the original method of training the network with simulation results is limited in sparse reward settings, especially in the early stages, where the network cannot yet give guidance. Hindsight Experience Replay (HER) addresses this issue by relabeling unsuccessful trajectories from the search tree as supervised learning signals. We introduce Adaptable HER (\ours{}), a flexible framework that integrates HER with AlphaZero, allowing easy adjustments to HER properties such as relabeled goals, policy targets, and trajectory selection. Our experiments, including equation discovery, show that the possibility of modifying HER is beneficial and surpasses the performance of pure supervised or reinforcement learning.
