Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm

Qinru Li; Hao Xiang

Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm

Qinru Li, Hao Xiang

TL;DR

The paper tackles learning to land a lunar module by evaluating classical (Q-Learning, SARSA, Monte Carlo) and neural (DQN, Double DQN, Clipped DQN) RL methods, introducing a Heuristic RL approach that injects a non-learnable heuristic during early training and gradually reduces its influence (vanishing bias). The main contribution is the design of a heuristic-guided augmentation that steers exploration toward the goal without embedding lasting human bias, validated across both tile-coded classical methods and deep Q-learning variants. Results show that heuristic-guided DQN variants achieve the highest success rates and scores, with considerable improvement over non-heuristic counterparts and notable stabilization and acceleration of learning. The approach offers a practical way to bootstrap RL in sparse-reward or complex environments and can be applied to other domains by replacing the heuristic with domain knowledge, followed by bias decay to rely on data-driven learning.

Abstract

Reinforcement Learning has achieved tremendous success in the many Atari games. In this paper we explored with the lunar lander environment and implemented classical methods including Q-Learning, SARSA, MC as well as tiling coding. We also implemented Neural Network based methods including DQN, Double DQN, Clipped DQN. On top of these, we proposed a new algorithm called Heuristic RL which utilizes heuristic to guide the early stage training while alleviating the introduced human bias. Our experiments showed promising results for our proposed methods in the lunar lander environment.

Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm

TL;DR

Abstract

Paper Structure (19 sections, 11 equations, 2 figures, 3 tables, 9 algorithms)

This paper contains 19 sections, 11 equations, 2 figures, 3 tables, 9 algorithms.

Introduction
Related work
Problem Formulation
Technical Approach
Game Environment
Q Learning
SARSA
Monte Carlo
Deep Q-Learning
Double DQN
Clipped Double Q-Learning
Heuristic-guided algorithms
Experiments
Metrics
Neural Network Setup
...and 4 more sections

Figures (2)

Figure 1: Average scores vs training episodes with batch size as 64. The figure is generated by plotting the average score for each episode.
Figure 2: Average scores vs training episodes with batch size as 1024. The figure is generated by plotting the average score for every 100 episode.

Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm

TL;DR

Abstract

Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm

Authors

TL;DR

Abstract

Table of Contents

Figures (2)