A Unifying Framework for Causal Imitation Learning with Hidden Confounders

Daqian Shao; Thomas Kleine Buening; Marta Kwiatkowska

A Unifying Framework for Causal Imitation Learning with Hidden Confounders

Daqian Shao, Thomas Kleine Buening, Marta Kwiatkowska

TL;DR

DML-IL is proposed, a novel algorithm that uses instrumental variable regression to solve a set of Conditional Moment Restrictions and learn a policy, and is provided a bound on the imitation gap for DML-IL.

Abstract

We propose a general and unifying framework for causal Imitation Learning (IL) with hidden confounders that subsumes several existing confounded IL settings from the literature. Our framework accounts for two types of hidden confounders: (a) those observed by the expert, which thus influence the expert's policy, and (b) confounding noise hidden to both the expert and the IL algorithm. For additional flexibility, we also introduce a confounding noise horizon and time-varying expert-observable hidden variables. We show that causal IL in our framework can be reduced to a set of Conditional Moment Restrictions (CMRs) by leveraging trajectory histories as instruments to learn a history-dependent policy. We propose DML-IL, a novel algorithm that uses instrumental variable regression to solve these CMRs and learn a policy. We provide a bound on the imitation gap for DML-IL, which recovers prior results as special cases. Empirical evaluation on a toy environment with continues state-action spaces and multiple Mujoco tasks demonstrate that DML-IL outperforms state-of-the-art causal IL algorithms.

A Unifying Framework for Causal Imitation Learning with Hidden Confounders

TL;DR

Abstract

A Unifying Framework for Causal Imitation Learning with Hidden Confounders

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (13)