Table of Contents
Fetching ...

Data-Driven Inverse Optimal Control for Continuous-Time Nonlinear Systems

Hamed Jabbari Asl, Eiji Uchibe

TL;DR

The paper tackles the problem of recovering cost functions for continuous-time nonlinear deterministic systems from expert trajectories. It introduces a dual-branch approach: a model-free pathway that estimates the input penalty $\mathbf{R}$ and the value-function weights $\mathbf{W}_V$ via gradient descent using policy information, followed by an HJB-based estimation of the state penalty $Q(\cdot)$ (and its weights $W_{Ql}$); an enhanced HJB formulation uses off-policy signals to enrich data for robust estimation. A special case with known input dynamics further reduces complexity by obviating the need for interaction between learner and environment, via Algorithm 2. Across simulations in MATLAB and MuJoCo, the method demonstrates accurate cost-function recovery, convergence of learned policies to the expert’s, and resilience to noise and model uncertainties, highlighting its potential for deployment in autonomous systems and robotics.

Abstract

This paper introduces a novel model-free and a partially model-free algorithm for inverse optimal control (IOC), also known as inverse reinforcement learning (IRL), aimed at estimating the cost function of continuous-time nonlinear deterministic systems. Using the input-state trajectories of an expert agent, the proposed algorithms separately utilize control policy information and the Hamilton-Jacobi-Bellman equation to estimate different sets of cost function parameters. This approach allows the algorithms to achieve broader applicability while maintaining a model-free framework. Also, the model-free algorithm reduces complexity compared to existing methods, as it requires solving a forward optimal control problem only once during initialization. Furthermore, in our partially model-free algorithm, this step can be bypassed entirely for systems with known input dynamics. Simulation results demonstrate the effectiveness and efficiency of our algorithms, highlighting their potential for real-world deployment in autonomous systems and robotics.

Data-Driven Inverse Optimal Control for Continuous-Time Nonlinear Systems

TL;DR

The paper tackles the problem of recovering cost functions for continuous-time nonlinear deterministic systems from expert trajectories. It introduces a dual-branch approach: a model-free pathway that estimates the input penalty and the value-function weights via gradient descent using policy information, followed by an HJB-based estimation of the state penalty (and its weights ); an enhanced HJB formulation uses off-policy signals to enrich data for robust estimation. A special case with known input dynamics further reduces complexity by obviating the need for interaction between learner and environment, via Algorithm 2. Across simulations in MATLAB and MuJoCo, the method demonstrates accurate cost-function recovery, convergence of learned policies to the expert’s, and resilience to noise and model uncertainties, highlighting its potential for deployment in autonomous systems and robotics.

Abstract

This paper introduces a novel model-free and a partially model-free algorithm for inverse optimal control (IOC), also known as inverse reinforcement learning (IRL), aimed at estimating the cost function of continuous-time nonlinear deterministic systems. Using the input-state trajectories of an expert agent, the proposed algorithms separately utilize control policy information and the Hamilton-Jacobi-Bellman equation to estimate different sets of cost function parameters. This approach allows the algorithms to achieve broader applicability while maintaining a model-free framework. Also, the model-free algorithm reduces complexity compared to existing methods, as it requires solving a forward optimal control problem only once during initialization. Furthermore, in our partially model-free algorithm, this step can be bypassed entirely for systems with known input dynamics. Simulation results demonstrate the effectiveness and efficiency of our algorithms, highlighting their potential for real-world deployment in autonomous systems and robotics.

Paper Structure

This paper contains 12 sections, 1 theorem, 36 equations, 6 figures, 2 algorithms.

Key Result

Lemma 1

For systems with single-dimensional input, an arbitrary fixed positive value can be assigned to $\mathbf{R}_l$. Consequently, the update of $\mathbf{R}_l$ in Step Step_Update_R_P of Algorithm Alg is not required when $m=1$.

Figures (6)

  • Figure 1: Block diagram of the proposed IRL/IOC method (Algorithm \ref{['Alg']}).
  • Figure 2: Block diagram of the proposed IRL/IOC method (Algorithm \ref{['Alg2']}).
  • Figure 3: Evolution of the norm of the control gain $\mathbf{K}_l$ and the parameter vector $\mathbf{W}_{Vl}$ for the system (\ref{['Sim_Example1']}).
  • Figure 4: Evolution of the norm of the control gain $\mathbf{K}_l$ and the parameter vector $\mathbf{W}_{Vl}$ for the system (\ref{['Sim_Example2']}).
  • Figure 5: (a) MuJoCo environment used for the quadrotor simulation. (b) Evolution of the norm of control policies estimated through model-free RL using the expert's original cost function and the estimated cost function.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 1
  • Lemma 1
  • proof
  • Remark 1