Table of Contents
Fetching ...

Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning

Jared Town, Zachary Morrison, Rushikesh Kamalapurkar

TL;DR

The paper tackles online IRL for deterministic linear systems when multiple cost functionals can explain the observed expert behavior. It introduces a Regularized History Stack Observer (RHSO) to handle nonuniqueness, leveraging a finite informativity (FI) data condition to guarantee convergence to an equivalent solution (or the true cost up to scale in the unique-solution case). The method builds on the History Stack Observer by replacing the inverse Gram term with a positive definite matrix and coupling with two history stacks and purging to maintain rank. Simulations on both nonunique and unique-solution problems illustrate convergence of the equivalence metric and learned feedback gains, with RHSO-KF demonstrating robustness to measurement noise. The work advances online IRL by providing rigorous conditions for convergence to equivalence classes of cost functionals and practical guidance for data informativity in real-time learning.

Abstract

A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix, and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.

Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning

TL;DR

The paper tackles online IRL for deterministic linear systems when multiple cost functionals can explain the observed expert behavior. It introduces a Regularized History Stack Observer (RHSO) to handle nonuniqueness, leveraging a finite informativity (FI) data condition to guarantee convergence to an equivalent solution (or the true cost up to scale in the unique-solution case). The method builds on the History Stack Observer by replacing the inverse Gram term with a positive definite matrix and coupling with two history stacks and purging to maintain rank. Simulations on both nonunique and unique-solution problems illustrate convergence of the equivalence metric and learned feedback gains, with RHSO-KF demonstrating robustness to measurement noise. The work advances online IRL by providing rigorous conditions for convergence to equivalence classes of cost functionals and practical guidance for data informativity in real-time learning.

Abstract

A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix, and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.
Paper Structure (12 sections, 3 theorems, 25 equations, 10 figures)

This paper contains 12 sections, 3 theorems, 25 equations, 10 figures.

Key Result

lemma 1

If $\hat{\Sigma}$ and $\Sigma_u$ satisfy Equation: Sigma_u Condition, then $\Omega_\Delta \cap \mathop{\mathrm{Null}}\nolimits(\hat{\Sigma}^{\top}) = \{0\}$, where $\Omega_\Delta \coloneqq \{ \Delta \in \mathbb{R}^{N(m+1)} \mid \Delta = \Sigma_u - \hat{\Sigma} \hat{W}$, for some $\hat{W} \in \math

Figures (10)

  • Figure 1: A log-scale plot of the 2-norm of $\Delta$ as a function of time.
  • Figure 2: A log-scale plot of the induced 2-norm of the error between the estimated feedback gain and the feedback gain of the expert as a function of time.
  • Figure 3: A log-scale plot of the 2-norm of the error between the state trajectory of the expert and the state trajectory of the learner under the learned feedback gain for a problem that admits multiple solutions. The red trajectory corresponds to the feedback gain learned using the RHSO and the blue trajectory corresponds to the feedback gain computed using offline ridge regression.
  • Figure 4: A plot of the induced 2-norm of the error between the estimated $\hat{Q}$ (red) and $\hat{R}$ (blue) matrices and the $Q$ and $R$ matrices of the expert as a function of time.
  • Figure 5: This plot is equal to 1 if $\mathop{\mathrm{Span}}\nolimits\{\hat{x}(t_i(t))\}_{i=1}^N = \mathbb{R}^n$ and 0 otherwise.
  • ...and 5 more figures

Theorems & Definitions (12)

  • definition 1
  • definition 2
  • remark 1
  • remark 2
  • lemma 1
  • proof
  • theorem 1
  • proof
  • remark 3
  • definition 3
  • ...and 2 more