Value of Information-Enhanced Exploration in Bootstrapped DQN

Stergios Plataniotis; Charilaos Akasiadis; Georgios Chalkiadakis

Value of Information-Enhanced Exploration in Bootstrapped DQN

Stergios Plataniotis, Charilaos Akasiadis, Georgios Chalkiadakis

TL;DR

Exploration in deep reinforcement learning remains challenging in sparse-reward environments. The authors introduce two EVOI-based extensions to Bootstrapped DQN, BootDQN-Gain and BootDQN-EVOI, which quantify information gain from head disagreements and integrate it into action selection. They demonstrate improved performance on several hard Atari games and increased diversity among ensemble heads, without adding extra hyperparameters. The approach offers a principled path to more efficient exploration by leveraging the information value of actions across an ensemble. Overall, the work suggests a viable direction for information-driven exploration in high-dimensional DRL.

Abstract

Efficient exploration in deep reinforcement learning remains a fundamental challenge, especially in environments characterized by high-dimensional states and sparse rewards. Traditional exploration strategies that rely on random local policy noise, such as $ε$-greedy and Boltzmann exploration methods, often struggle to efficiently balance exploration and exploitation. In this paper, we integrate the notion of (expected) value of information (EVOI) within the well-known Bootstrapped DQN algorithmic framework, to enhance the algorithm's deep exploration ability. Specifically, we develop two novel algorithms that incorporate the expected gain from learning the value of information into Bootstrapped DQN. Our methods use value of information estimates to measure the discrepancies of opinions among distinct network heads, and drive exploration towards areas with the most potential. We evaluate our algorithms with respect to performance and their ability to exploit inherent uncertainty arising from random network initialization. Our experiments in complex, sparse-reward Atari games demonstrate increased performance, all the while making better use of uncertainty, and, importantly, without introducing extra hyperparameters.

Value of Information-Enhanced Exploration in Bootstrapped DQN

TL;DR

Abstract

-greedy and Boltzmann exploration methods, often struggle to efficiently balance exploration and exploitation. In this paper, we integrate the notion of (expected) value of information (EVOI) within the well-known Bootstrapped DQN algorithmic framework, to enhance the algorithm's deep exploration ability. Specifically, we develop two novel algorithms that incorporate the expected gain from learning the value of information into Bootstrapped DQN. Our methods use value of information estimates to measure the discrepancies of opinions among distinct network heads, and drive exploration towards areas with the most potential. We evaluate our algorithms with respect to performance and their ability to exploit inherent uncertainty arising from random network initialization. Our experiments in complex, sparse-reward Atari games demonstrate increased performance, all the while making better use of uncertainty, and, importantly, without introducing extra hyperparameters.

Value of Information-Enhanced Exploration in Bootstrapped DQN

TL;DR

Abstract

Value of Information-Enhanced Exploration in Bootstrapped DQN

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)