Table of Contents
Fetching ...

Partial Identifiability and Misspecification in Inverse Reinforcement Learning

Joar Skalse, Alessandro Abate

TL;DR

This paper fully characterise and quantify the ambiguity of the reward function for all of the behavioural models that are most common in the current IRL literature and introduces a cohesive framework for reasoning about partial identifiability and misspecification in IRL.

Abstract

The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $π$. This problem is difficult, for several reasons. First of all, there are typically multiple reward functions which are compatible with a given policy; this means that the reward function is only *partially identifiable*, and that IRL contains a certain fundamental degree of ambiguity. Secondly, in order to infer $R$ from $π$, an IRL algorithm must have a *behavioural model* of how $π$ relates to $R$. However, the true relationship between human preferences and human behaviour is very complex, and practically impossible to fully capture with a simple model. This means that the behavioural model in practice will be *misspecified*, which raises the worry that it might lead to unsound inferences if applied to real-world data. In this paper, we provide a comprehensive mathematical analysis of partial identifiability and misspecification in IRL. Specifically, we fully characterise and quantify the ambiguity of the reward function for all of the behavioural models that are most common in the current IRL literature. We also provide necessary and sufficient conditions that describe precisely how the observed demonstrator policy may differ from each of the standard behavioural models before that model leads to faulty inferences about the reward function $R$. In addition to this, we introduce a cohesive framework for reasoning about partial identifiability and misspecification in IRL, together with several formal tools that can be used to easily derive the partial identifiability and misspecification robustness of new IRL models, or analyse other kinds of reward learning algorithms.

Partial Identifiability and Misspecification in Inverse Reinforcement Learning

TL;DR

This paper fully characterise and quantify the ambiguity of the reward function for all of the behavioural models that are most common in the current IRL literature and introduces a cohesive framework for reasoning about partial identifiability and misspecification in IRL.

Abstract

The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function from a policy . This problem is difficult, for several reasons. First of all, there are typically multiple reward functions which are compatible with a given policy; this means that the reward function is only *partially identifiable*, and that IRL contains a certain fundamental degree of ambiguity. Secondly, in order to infer from , an IRL algorithm must have a *behavioural model* of how relates to . However, the true relationship between human preferences and human behaviour is very complex, and practically impossible to fully capture with a simple model. This means that the behavioural model in practice will be *misspecified*, which raises the worry that it might lead to unsound inferences if applied to real-world data. In this paper, we provide a comprehensive mathematical analysis of partial identifiability and misspecification in IRL. Specifically, we fully characterise and quantify the ambiguity of the reward function for all of the behavioural models that are most common in the current IRL literature. We also provide necessary and sufficient conditions that describe precisely how the observed demonstrator policy may differ from each of the standard behavioural models before that model leads to faulty inferences about the reward function . In addition to this, we introduce a cohesive framework for reasoning about partial identifiability and misspecification in IRL, together with several formal tools that can be used to easily derive the partial identifiability and misspecification robustness of new IRL models, or analyse other kinds of reward learning algorithms.

Paper Structure

This paper contains 68 sections, 149 theorems, 163 equations, 26 figures.

Key Result

Lemma 3

Consider two reward objects $f : \mathcal{R} \to X$, $g : \mathcal{R} \to Y$. If there exists a function $h : X \to Y$ such that $h \circ f = g$, then $\mathrm{Am}(f) \preceq \mathrm{Am}(g)$.

Figures (26)

  • Figure 1: This figure illustrates Definition \ref{['def:reward_object_behavioural_model']}-\ref{['def:refinement']} visually. Specifically, suppose $f : \mathcal{R} \to X$ and $g : \mathcal{R} \to Y$ are functions (or "reward objects" in our terminology). Now $f$ induces a partitioning $\mathrm{Am}(f)$ of $\mathcal{R}$ according to which $R_1$ and $R_2$ belong to the same partition if (and only if) $f(R_1) = f(R_2)$, and likewise for $g$ and $\mathrm{Am}(g)$. If $g(R_1) = g(R_2)$ whenever $f(R_1) = f(R_2)$, then $\mathrm{Am}(f)$ is a partition refinement of $\mathrm{Am}(g)$, which can be visualised as in the figure above. This corresponds to the case when $\mathrm{Am}(f) \preceq \mathrm{Am}(g)$, where $f$ is less ambiguous than $g$.
  • Figure 2: This figure gives a simple illustration of how Definition \ref{['def:reward_object_behavioural_model']}-\ref{['def:refinement']} induces a partial order over objects that can be computed from reward functions. For example, let $q$ be the function that, given a reward function $R$, returns the optimal $Q$-function $Q^\star$, and let $v$ be the function that, given a reward function $R$, returns the optimal value-function $V^\star$. Since $V^\star$ can be computed from $Q^\star$, we have that $\mathrm{Am}(q) \preceq \mathrm{Am}(v)$, which can be represented as $q \to v$ (or $Q^\star \to V^\star$) in a figure. Important relationships between data sources can then be read out graphically --- for example, if $Q^\star$ is too ambiguous for a given application, then $V^\star$ must be too ambiguous as well.
  • Figure 3: This figure illustrates the conditions in Definition \ref{['def:misspecification_eq']}. Both $f$ and $g$ are functions from the space of all rewards $\mathcal{R}$ to some set $X$, and $P$ is a partitioning of $\mathcal{R}$. The learning algorithm $\mathcal{L}$ observes $x = g(R^\star)$ for some unknown reward function $R^\star$, and will find a reward function $R_H$ such that $f(R_H) = x$. We wish to ensure that $R_H \equiv_{P} R^\star$. If this holds for all $R_H$ and $R^\star$ such that $f(R_H) = g(R^\star)$, together with the other conditions in Definition \ref{['def:misspecification_eq']}, when we say that $f$ is $P$-robust to misspecification with $g$.
  • Figure 4: This figure illustrates the conditions in Definition \ref{['def:misspecification_metric']}. Both $f$ and $g$ are functions from the space of all rewards $\mathcal{R}$ to some set $X$, and we have some pseudometric $d^\mathcal{R}$ on $\mathcal{R}$. The learning algorithm $\mathcal{L}$ observes $x = g(R^\star)$ for some unknown reward function $R^\star$, and will find a reward function $R_H$ such that $f(R_H) = x$. We wish to ensure that $d^\mathcal{R}(R_H, R^\star) \leq \epsilon$. If this holds for all $R_H$ and $R^\star$ such that $f(R_H) = g(R^\star)$, together with the other conditions in Definition \ref{['def:misspecification_metric']}, when we say that $f$ is $\epsilon$-robust to misspecification with $g$ (as measured by the pseudometric $d^\mathcal{R}$).
  • Figure 5: This figure summarises our results from Section \ref{['sec:partial_identifiability']} (also incorporating some results from \ref{['appendix:reward_transformation_properties']}). On the left-hand side, we list several reward objects and equivalence relations on $\mathcal{R}$. We write $f \to g$ if $\mathrm{Am}(f) \preceq \mathrm{Am}(g)$. Since ambiguity refinement is transitive and antisymmetric, this lets us place all reward objects in a lattice structure. Using this structure, we can read out several important relationships graphically: if $f \to g$, then a data source that is based on $g$ is at least as ambiguous as a data source based on $f$, the information contained in a data source based on $f$ is sufficient to derive the value of $g$ as an application, and it is in principle possible to compute $g$ based on $f$. Note that the lattice structure in this case forms a linear order --- this is a special property of the reward objects and equivalence relations we have studied, and does not hold in general. On the right-hand side of the figure we list the reward transformations that characterise the ambiguity of the reward objects to the left.
  • ...and 21 more figures

Theorems & Definitions (267)

  • Definition 1
  • Definition 2
  • Definition 3
  • Lemma 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Proposition 7
  • Proposition 7
  • ...and 257 more