Table of Contents
Fetching ...

Inverse Transition Learning: Learning Dynamics from Demonstrations

Leo Benac, Abhishek Sharma, Sonali Parbhoo, Finale Doshi-Velez

TL;DR

Across both synthetic environments and real healthcare scenarios like Intensive Care Unit (ICU) patient management in hypotension, this work demonstrates not only significant improvements in decision-making, but that the posterior can inform when transfer will be successful.

Abstract

We consider the problem of estimating the transition dynamics $T^*$ from near-optimal expert trajectories in the context of offline model-based reinforcement learning. We develop a novel constraint-based method, Inverse Transition Learning, that treats the limited coverage of the expert trajectories as a \emph{feature}: we use the fact that the expert is near-optimal to inform our estimate of $T^*$. We integrate our constraints into a Bayesian approach. Across both synthetic environments and real healthcare scenarios like Intensive Care Unit (ICU) patient management in hypotension, we demonstrate not only significant improvements in decision-making, but that our posterior can inform when transfer will be successful.

Inverse Transition Learning: Learning Dynamics from Demonstrations

TL;DR

Across both synthetic environments and real healthcare scenarios like Intensive Care Unit (ICU) patient management in hypotension, this work demonstrates not only significant improvements in decision-making, but that the posterior can inform when transfer will be successful.

Abstract

We consider the problem of estimating the transition dynamics from near-optimal expert trajectories in the context of offline model-based reinforcement learning. We develop a novel constraint-based method, Inverse Transition Learning, that treats the limited coverage of the expert trajectories as a \emph{feature}: we use the fact that the expert is near-optimal to inform our estimate of . We integrate our constraints into a Bayesian approach. Across both synthetic environments and real healthcare scenarios like Intensive Care Unit (ICU) patient management in hypotension, we demonstrate not only significant improvements in decision-making, but that our posterior can inform when transfer will be successful.

Paper Structure

This paper contains 34 sections, 4 theorems, 16 equations, 11 figures, 8 tables, 3 algorithms.

Key Result

Theorem 1

If $\pi_{\epsilon}(. \mid .; T^*)=\pi^*(T^*)$ and some dynamics $T$ satisfies constraints($\pi_{\epsilon}(. \mid .; T^*)$) for each state $s$, then $\pi^*(T) = \pi^*(T^*)$. Hence, $T$ will recover the optimal action $a^*$ with respect to the true transition dynamics $T^*$ for each state $s$. (Note t

Figures (11)

  • Figure 1: Performance of ITL on a held out validation set across different $\epsilon$ values.
  • Figure 2: Top row: Normalized Value vs. Coverage for Gridworld (left: Standard Task, middle: Transfer Task), Bottom row: Normalized Value vs. Coverage for Randomworlds (left: Standard Task, middle: Transfer Task). Rightmost plots: Normalized Value vs. Bayesian Regret of both Tasks (top: Gridworld, bottom: Randomworlds).
  • Figure 3: Most likely next 3 states after prescribing Intravenous treatment in state O2= 1, BP = 1, GCS = 1, Crea = 2
  • Figure 4: Example of a subspace of the feasible region defined by the constraints on $T$. The label 'tsas" indicates the probability of transitioning to state $s'$ after being in state $s$ and taking action $a$.
  • Figure 4: Gridworld and Randomworlds Results (40% stochastic-policy states)
  • ...and 6 more figures

Theorems & Definitions (12)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • ...and 2 more