Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm
Qinru Li, Hao Xiang
TL;DR
The paper tackles learning to land a lunar module by evaluating classical (Q-Learning, SARSA, Monte Carlo) and neural (DQN, Double DQN, Clipped DQN) RL methods, introducing a Heuristic RL approach that injects a non-learnable heuristic during early training and gradually reduces its influence (vanishing bias). The main contribution is the design of a heuristic-guided augmentation that steers exploration toward the goal without embedding lasting human bias, validated across both tile-coded classical methods and deep Q-learning variants. Results show that heuristic-guided DQN variants achieve the highest success rates and scores, with considerable improvement over non-heuristic counterparts and notable stabilization and acceleration of learning. The approach offers a practical way to bootstrap RL in sparse-reward or complex environments and can be applied to other domains by replacing the heuristic with domain knowledge, followed by bias decay to rely on data-driven learning.
Abstract
Reinforcement Learning has achieved tremendous success in the many Atari games. In this paper we explored with the lunar lander environment and implemented classical methods including Q-Learning, SARSA, MC as well as tiling coding. We also implemented Neural Network based methods including DQN, Double DQN, Clipped DQN. On top of these, we proposed a new algorithm called Heuristic RL which utilizes heuristic to guide the early stage training while alleviating the introduced human bias. Our experiments showed promising results for our proposed methods in the lunar lander environment.
