Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning
Jared Town, Zachary Morrison, Rushikesh Kamalapurkar
TL;DR
The paper tackles online IRL for deterministic linear systems when multiple cost functionals can explain the observed expert behavior. It introduces a Regularized History Stack Observer (RHSO) to handle nonuniqueness, leveraging a finite informativity (FI) data condition to guarantee convergence to an equivalent solution (or the true cost up to scale in the unique-solution case). The method builds on the History Stack Observer by replacing the inverse Gram term with a positive definite matrix and coupling with two history stacks and purging to maintain rank. Simulations on both nonunique and unique-solution problems illustrate convergence of the equivalence metric and learned feedback gains, with RHSO-KF demonstrating robustness to measurement noise. The work advances online IRL by providing rigorous conditions for convergence to equivalence classes of cost functionals and practical guidance for data informativity in real-time learning.
Abstract
A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix, and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.
