Table of Contents
Fetching ...

Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach

Renzi Wang, Flavia Sofia Acerbo, Tong Duy Son, Panagiotis Patrinos

TL;DR

This paper presents a novel approach to imitation learning from observations, where an autoregressive mixture of experts model is deployed to fit the underlying policy through a two-stage framework, incorporating a Lyapunov stability constraint to ensure asymptotic stability of the identified model.

Abstract

This paper presents a novel approach to imitation learning from observations, where an autoregressive mixture of experts model is deployed to fit the underlying policy. The parameters of the model are learned via a two-stage framework. By leveraging the existing dynamics knowledge, the first stage of the framework estimates the control input sequences and hence reduces the problem complexity. At the second stage, the policy is learned by solving a regularized maximum-likelihood estimation problem using the estimated control input sequences. We further extend the learning procedure by incorporating a Lyapunov stability constraint to ensure asymptotic stability of the identified model, for accurate multi-step predictions. The effectiveness of the proposed framework is validated using two autonomous driving datasets collected from human demonstrations, demonstrating its practical applicability in modelling complex nonlinear dynamics.

Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach

TL;DR

This paper presents a novel approach to imitation learning from observations, where an autoregressive mixture of experts model is deployed to fit the underlying policy through a two-stage framework, incorporating a Lyapunov stability constraint to ensure asymptotic stability of the identified model.

Abstract

This paper presents a novel approach to imitation learning from observations, where an autoregressive mixture of experts model is deployed to fit the underlying policy. The parameters of the model are learned via a two-stage framework. By leveraging the existing dynamics knowledge, the first stage of the framework estimates the control input sequences and hence reduces the problem complexity. At the second stage, the policy is learned by solving a regularized maximum-likelihood estimation problem using the estimated control input sequences. We further extend the learning procedure by incorporating a Lyapunov stability constraint to ensure asymptotic stability of the identified model, for accurate multi-step predictions. The effectiveness of the proposed framework is validated using two autonomous driving datasets collected from human demonstrations, demonstrating its practical applicability in modelling complex nonlinear dynamics.

Paper Structure

This paper contains 12 sections, 1 theorem, 34 equations, 3 figures, 2 tables, 1 algorithm.

Key Result

Lemma 4.1

Consider system eq: policy_esti_stable. Let $C_i = \Sigma_i^{-1} = \Sigma_i^{-1} $, $\Lambda_i = \Sigma_i^{-1}$. Define a linear selection operator $S(C_i) = \Sigma_i^{-1}A_i$ that selects the first $n_u$ columns of matrix $C_i$. The inequality implies eq: lyapunov_cons.

Figures (3)

  • Figure 1: Illustration of Frenet coordinates
  • Figure 2: Recursive one-step-ahead prediction for lane-keeping scenario. Shaded regions represent quantiles between $0.25$ and $0.75$. Improper init denotes the case where $p(\xi_0)$ follows a uniform distribution and $\hat{u}_{0} = -1.5, -0.04$.
  • Figure 3: Recursive one-step-ahead prediction for double-lane-change scenario. Shaded regions represent quantiles between $0.25$ and $0.75$. Improper init denotes that $p(\xi_0)$ follows a uniform distribution and $\hat{u}_{0} = -0.05, -0.01$. For visual clarity, only the first 20 steps of the case without \ref{['eq: lmi_stable']} are shown due to significant trajectory oscillations.

Theorems & Definitions (2)

  • Lemma 4.1
  • proof