Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

Erin J. Talvitie; Zilei Shao; Huiying Li; Jinghan Hu; Jacob Boerma; Rory Zhao; Xintong Wang

Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

Erin J. Talvitie, Zilei Shao, Huiying Li, Jinghan Hu, Jacob Boerma, Rory Zhao, Xintong Wang

TL;DR

Binding-box inference, which operates on bounding-boxes around sets of possible states and other quantities, is proposed and evaluated and found that bounding-box inference can reliably support effective selective planning.

Abstract

In model-based reinforcement learning, simulated experiences from the learned model are often treated as equivalent to experience from the real environment. However, when the model is inaccurate, it can catastrophically interfere with policy learning. Alternatively, the agent might learn about the model's accuracy and selectively use it only when it can provide reliable predictions. We empirically explore model uncertainty measures for selective planning and show that best results require distribution insensitive inference to estimate the uncertainty over model-based updates. To that end, we propose and evaluate bounding-box inference, which operates on bounding-boxes around sets of possible states and other quantities. We find that bounding-box inference can reliably support effective selective planning.

Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

TL;DR

Abstract

Paper Structure (38 sections, 5 equations, 12 figures, 16 tables)

This paper contains 38 sections, 5 equations, 12 figures, 16 tables.

Introduction
Problem Setting and Background
Sources of Model Error
Model-Based Value Expansion
Selective Model-Based Value Expansion
Experiments with Hand-Coded Models
The Go-Right Problem
Experimental Setup
Unselective Planning Results
Expectation Models
Sampling Models
One-Step Predicted Variance
Monte-Carlo Target Variance
Limitations of Target Variance
Monte Carlo Target Range
...and 23 more sections

Figures (12)

Figure 1: Left: an illustration of the Go-Right domain. Right: Results of unselective MVE planning in Go-Right. The curves are smoothed so that each point is the average of the previous 100 episode scores. The shaded regions represent the (smoothed) standard error at each point.
Figure 2: Selective planning with hand-coded models in Go-Right (left) and Go-Right-10 (right).
Figure 3: Selective planning with decision tree models in Go-Right (left) and Go-Right-10 (right).
Figure 4: Selective planning with neural network models in Go-Right (left) and Go-Right-10 (right).
Figure 5: Planning with decision tree models in Acrobot (left) and Distractrobot (right). As above, curves are smoothed over 100 episodes and the shaded region represents standard error.
...and 7 more figures

Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

TL;DR

Abstract

Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)