Investigating Intra-Abstraction Policies For Non-exact Abstraction Algorithms

Robin Schmöcker; Alexander Dockhorn; Bodo Rosenhahn

Investigating Intra-Abstraction Policies For Non-exact Abstraction Algorithms

Robin Schmöcker, Alexander Dockhorn, Bodo Rosenhahn

TL;DR

This paper proposes and empirically evaluates several alternative intra-abstraction policies, several of which outperform the random policy across a majority of environments and parameter settings.

Abstract

One weakness of Monte Carlo Tree Search (MCTS) is its sample efficiency which can be addressed by building and using state and/or action abstractions in parallel to the tree search such that information can be shared among nodes of the same layer. The primary usage of abstractions for MCTS is to enhance the Upper Confidence Bound (UCB) value during the tree policy by aggregating visits and returns of an abstract node. However, this direct usage of abstractions does not take the case into account where multiple actions with the same parent might be in the same abstract node, as these would then all have the same UCB value, thus requiring a tiebreak rule. In state-of-the-art abstraction algorithms such as pruned On the Go Abstractions (pruned OGA), this case has not been noticed, and a random tiebreak rule was implicitly chosen. In this paper, we propose and empirically evaluate several alternative intra-abstraction policies, several of which outperform the random policy across a majority of environments and parameter settings.

Investigating Intra-Abstraction Policies For Non-exact Abstraction Algorithms

TL;DR

Abstract

Investigating Intra-Abstraction Policies For Non-exact Abstraction Algorithms

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (33)