Table of Contents
Fetching ...

State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems

Tochukwu Elijah Ogri, Muzaffar Qureshi, Zachary I. Bell, Rushikesh Kamalapurkar

TL;DR

A novel online, output-feedback, critic-only, model-based reinforcement learning framework is developed for safety-critical control systems operating in complex environments that ensures system stability and safety, regardless of the lack of full-state measurement, while learning and implementing an optimal controller.

Abstract

In this paper, a novel online, output-feedback, critic-only, model-based reinforcement learning framework is developed for safety-critical control systems operating in complex environments. The developed framework ensures system stability and safety, regardless of the lack of full-state measurement, while learning and implementing an optimal controller. The approach leverages linear matrix inequality-based observer design method to efficiently search for observer gains for effective state estimation. Then, approximate dynamic programming is used to develop an approximate controller that uses simulated experiences to guarantee the safety and stability of the closed-loop system. Safety is enforced by adding a recentered robust Lyapunov-like barrier function to the cost function that effectively enforces safety constraints, even in the presence of uncertainty in the state. Lyapunov-based stability analysis is used to guarantee uniform ultimate boundedness of the trajectories of the closed-loop system and ensure safety. Simulation studies are performed to demonstrate the effectiveness of the developed method through two real-world safety-critical scenarios, ensuring that the state trajectories of a given system remain in a given set and obstacle avoidance.

State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems

TL;DR

A novel online, output-feedback, critic-only, model-based reinforcement learning framework is developed for safety-critical control systems operating in complex environments that ensures system stability and safety, regardless of the lack of full-state measurement, while learning and implementing an optimal controller.

Abstract

In this paper, a novel online, output-feedback, critic-only, model-based reinforcement learning framework is developed for safety-critical control systems operating in complex environments. The developed framework ensures system stability and safety, regardless of the lack of full-state measurement, while learning and implementing an optimal controller. The approach leverages linear matrix inequality-based observer design method to efficiently search for observer gains for effective state estimation. Then, approximate dynamic programming is used to develop an approximate controller that uses simulated experiences to guarantee the safety and stability of the closed-loop system. Safety is enforced by adding a recentered robust Lyapunov-like barrier function to the cost function that effectively enforces safety constraints, even in the presence of uncertainty in the state. Lyapunov-based stability analysis is used to guarantee uniform ultimate boundedness of the trajectories of the closed-loop system and ensure safety. Simulation studies are performed to demonstrate the effectiveness of the developed method through two real-world safety-critical scenarios, ensuring that the state trajectories of a given system remain in a given set and obstacle avoidance.
Paper Structure (21 sections, 6 theorems, 83 equations, 9 figures, 1 table)

This paper contains 21 sections, 6 theorems, 83 equations, 9 figures, 1 table.

Key Result

Theorem 1

SCC.Mazumdar.ea1974 Let $\mathcal{X}$ be a closed convex subset of $\mathbb{R}^{n}$. For any $\hat{x} \in \mathbb{R}^{n}$, there exists a unique element $\mathop{\mathrm{\mathbf{Pr}}}\nolimits(\hat{x}) \in \mathcal{X}$ such that $\left\|\hat{x} - \mathop{\mathrm{\mathbf{Pr}}}\nolimits(\hat{x})\right

Figures (9)

  • Figure 1: Trajectories of the state and estimated state for the system in \ref{['eq:simDyn']} when the safe RL framework from SCC.Cohen.Belta.ea2020 is used to solve the obstacle avoidance problem in Section \ref{['subsection:simStudy2']}. It can be observed that while the state estimate trajectory $\hat{x}(t)$ remains outside the obstacle boundary, the actual state trajectory $x(t)$ breaches it. The failure of the actual system at avoiding the obstacle when using a controller with state estimates highlights the limitations of relying solely on CBFs to guarantee the safety of an output feedback nonlinear system without the augmentation of the CBF with a robustifying term.
  • Figure 2: Ensuring safety by staying within a given set.
  • Figure 3: Obstacle avoidance.
  • Figure 4: The trajectories of the actual state $x$ and estimated state $\hat{x}$ for the ensuring safety within a given set.
  • Figure 5: The trajectories of the estimated critic NN weights for ensuring safety within a given set.
  • ...and 4 more figures

Theorems & Definitions (14)

  • Remark 1
  • Theorem 1
  • Theorem 2
  • proof
  • Remark 2
  • Definition 1
  • Theorem 3
  • Definition 2
  • Lemma 1
  • proof
  • ...and 4 more