Table of Contents
Fetching ...

Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

Erin J. Talvitie, Zilei Shao, Huiying Li, Jinghan Hu, Jacob Boerma, Rory Zhao, Xintong Wang

TL;DR

Binding-box inference, which operates on bounding-boxes around sets of possible states and other quantities, is proposed and evaluated and found that bounding-box inference can reliably support effective selective planning.

Abstract

In model-based reinforcement learning, simulated experiences from the learned model are often treated as equivalent to experience from the real environment. However, when the model is inaccurate, it can catastrophically interfere with policy learning. Alternatively, the agent might learn about the model's accuracy and selectively use it only when it can provide reliable predictions. We empirically explore model uncertainty measures for selective planning and show that best results require distribution insensitive inference to estimate the uncertainty over model-based updates. To that end, we propose and evaluate bounding-box inference, which operates on bounding-boxes around sets of possible states and other quantities. We find that bounding-box inference can reliably support effective selective planning.

Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

TL;DR

Binding-box inference, which operates on bounding-boxes around sets of possible states and other quantities, is proposed and evaluated and found that bounding-box inference can reliably support effective selective planning.

Abstract

In model-based reinforcement learning, simulated experiences from the learned model are often treated as equivalent to experience from the real environment. However, when the model is inaccurate, it can catastrophically interfere with policy learning. Alternatively, the agent might learn about the model's accuracy and selectively use it only when it can provide reliable predictions. We empirically explore model uncertainty measures for selective planning and show that best results require distribution insensitive inference to estimate the uncertainty over model-based updates. To that end, we propose and evaluate bounding-box inference, which operates on bounding-boxes around sets of possible states and other quantities. We find that bounding-box inference can reliably support effective selective planning.
Paper Structure (38 sections, 5 equations, 12 figures, 16 tables)

This paper contains 38 sections, 5 equations, 12 figures, 16 tables.

Figures (12)

  • Figure 1: Left: an illustration of the Go-Right domain. Right: Results of unselective MVE planning in Go-Right. The curves are smoothed so that each point is the average of the previous 100 episode scores. The shaded regions represent the (smoothed) standard error at each point.
  • Figure 2: Selective planning with hand-coded models in Go-Right (left) and Go-Right-10 (right).
  • Figure 3: Selective planning with decision tree models in Go-Right (left) and Go-Right-10 (right).
  • Figure 4: Selective planning with neural network models in Go-Right (left) and Go-Right-10 (right).
  • Figure 5: Planning with decision tree models in Acrobot (left) and Distractrobot (right). As above, curves are smoothed over 100 episodes and the shaded region represents standard error.
  • ...and 7 more figures