Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-Learning
Muhammad Junaid Khan, Syed Hammad Ahmed, Gita Sukthankar
TL;DR
The paper addresses the challenge of sample-efficient learning in ensemble Q-learning by integrating bootstrapping of experiences with multi-head self-attention (MHA) inside each Q-learner, forming a variant of the REDQ/DroQ framework. The method preserves the ensemble structure (with $N$ learners and subset size $M$ for in-target minimization) while incorporating dropout, layer normalization, a fully connected pre-layer, and a multi-head attention module (8 heads, embedding $d \
Abstract
We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning. Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble. This not only results in performance improvements over the original REDQ (Chen et al. 2021) and its variant DroQ (Hi-raoka et al. 2022), thereby enhancing Q predictions, but also effectively reduces both the average normalized bias and standard deviation of normalized bias within Q-function ensembles. Importantly, our method also performs well even in scenarios with a low update-to-data (UTD) ratio. Notably, the implementation of our proposed method is straightforward, requiring minimal modifications to the base model.
