Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards

Noah Topper; Alvaro Velasquez; George Atia

Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards

Noah Topper, Alvaro Velasquez, George Atia

TL;DR

This work proposes a Bayesian IRL (BIRL) framework for inferring RMs directly from expert behavior, requiring significant changes to the standard framework, and proposes a novel modification to simulated annealing to maximize this posterior.

Abstract

Inverse reinforcement learning (IRL) is the problem of inferring a reward function from expert behavior. There are several approaches to IRL, but most are designed to learn a Markovian reward. However, a reward function might be non-Markovian, depending on more than just the current state, such as a reward machine (RM). Although there has been recent work on inferring RMs, it assumes access to the reward signal, absent in IRL. We propose a Bayesian IRL (BIRL) framework for inferring RMs directly from expert behavior, requiring significant changes to the standard framework. We define a new reward space, adapt the expert demonstration to include history, show how to compute the reward posterior, and propose a novel modification to simulated annealing to maximize this posterior. We demonstrate that our method performs well when optimizing according to its inferred reward and compares favorably to an existing method that learns exclusively binary non-Markovian rewards.

Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards

TL;DR

Abstract

Paper Structure (11 sections, 1 equation, 5 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 1 equation, 5 figures, 1 table, 1 algorithm.

Introduction
Preliminaries
Methodology
Non-Markovian BIRL Framework
Setting the Parameters and Schedules
Setting the Prior
Related Work
Experiments
Conclusion and Future Work
Experiments - Supplementary
Pseudocode

Figures (5)

Figure 1: Recharge gridworld environment from Vazquez2018
Figure 2: Office gridworld environment from ToroIcarte2020
Figure 3: Inferred reward machine for recharge gridworld.
Figure 4: Inferred reward machine for basic coffee gridworld.
Figure 5: Inferred reward machine for multi coffee gridworld.

Theorems & Definitions (4)

Definition 1
Definition 2
Definition 3
Definition 4

Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards

TL;DR

Abstract

Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (4)