Table of Contents
Fetching ...

Automated Feature Selection for Inverse Reinforcement Learning

Daulet Baimukashev, Gokhan Alcan, Ville Kyrki

TL;DR

This work proposes a method that employs polynomial basis functions to form a candidate set of features, which are shown to allow the matching of statistical moments of state distributions, and demonstrates the approach's effectiveness by recovering reward functions that capture expert policies across non-linear control tasks of increasing complexity.

Abstract

Inverse reinforcement learning (IRL) is an imitation learning approach to learning reward functions from expert demonstrations. Its use avoids the difficult and tedious procedure of manual reward specification while retaining the generalization power of reinforcement learning. In IRL, the reward is usually represented as a linear combination of features. In continuous state spaces, the state variables alone are not sufficiently rich to be used as features, but which features are good is not known in general. To address this issue, we propose a method that employs polynomial basis functions to form a candidate set of features, which are shown to allow the matching of statistical moments of state distributions. Feature selection is then performed for the candidates by leveraging the correlation between trajectory probabilities and feature expectations. We demonstrate the approach's effectiveness by recovering reward functions that capture expert policies across non-linear control tasks of increasing complexity. Code, data, and videos are available at https://sites.google.com/view/feature4irl.

Automated Feature Selection for Inverse Reinforcement Learning

TL;DR

This work proposes a method that employs polynomial basis functions to form a candidate set of features, which are shown to allow the matching of statistical moments of state distributions, and demonstrates the approach's effectiveness by recovering reward functions that capture expert policies across non-linear control tasks of increasing complexity.

Abstract

Inverse reinforcement learning (IRL) is an imitation learning approach to learning reward functions from expert demonstrations. Its use avoids the difficult and tedious procedure of manual reward specification while retaining the generalization power of reinforcement learning. In IRL, the reward is usually represented as a linear combination of features. In continuous state spaces, the state variables alone are not sufficiently rich to be used as features, but which features are good is not known in general. To address this issue, we propose a method that employs polynomial basis functions to form a candidate set of features, which are shown to allow the matching of statistical moments of state distributions. Feature selection is then performed for the candidates by leveraging the correlation between trajectory probabilities and feature expectations. We demonstrate the approach's effectiveness by recovering reward functions that capture expert policies across non-linear control tasks of increasing complexity. Code, data, and videos are available at https://sites.google.com/view/feature4irl.
Paper Structure (16 sections, 1 theorem, 17 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 1 theorem, 17 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Matching the expectations of features consisting of second-order polynomials leads to matching the mean and variance of the distributions.

Figures (4)

  • Figure 1: A central open challenge in inverse reinforcement learning is the choice of suitable features to represent the reward. We propose a method that constructs a candidate feature set and then selects a subset that best describes expected rewards.
  • Figure 2: Benchmark tasks used in this paper.
  • Figure 3: Mean cumulative rewards for policies trained using various feature sets, calculated across 10 different initial conditions. A) Pendulum, B) Acrobot, C) CartPole.
  • Figure 4: 2D Wasserstein distance between training and testing data for the Pendulum and Acrobot environments.

Theorems & Definitions (2)

  • Proposition 1
  • proof