Table of Contents
Fetching ...

Adaptive Actor-Critic Based Optimal Regulation for Drift-Free Uncertain Nonlinear Systems

Ashwin P. Dani, Shubhendu Bhasin

TL;DR

This work tackles optimal regulation for drift-free nonlinear systems with unknown input gain matrices $g(x,\theta)$ by formulating a continuous-time adaptive actor-critic (AAC) reinforcement learning controller. It employs concurrent learning to identify the constant parameter vector $\theta$ in $g(x,\theta)$ while critic and actor NNs approximate the value function $V^*(\bar{x})$ and the optimal policy $u^*$, guided by the Bellman error $\delta$. A Lyapunov-based analysis shows the closed-loop signals are uniformly ultimately bounded (UUB), with a finite-excitation condition ensuring parameter convergence and a sigma-modification safeguard when excitation is incomplete. Simulation studies on image-based visual servoing (IBVS) and wheeled mobile robots (WMR) validate near-optimal regulation, bounded weights, and convergence of parameter estimates, demonstrating practical applicability to robotics with uncertain input gain.

Abstract

In this paper, a continuous-time adaptive actor-critic reinforcement learning (RL) controller is developed for drift-free nonlinear systems. Practical examples of such systems are image-based visual servoing (IBVS) and wheeled mobile robots (WMR), where the system dynamics includes a parametric uncertainty in the control effectiveness matrix with no drift term. The uncertainty in the input term poses a challenge for developing a continuous-time RL controller using existing methods. In this paper, an actor-critic or synchronous policy iteration (PI)-based RL controller is presented with a concurrent learning (CL)-based parameter update law for estimating the unknown parameters of the control effectiveness matrix. An infinite-horizon value function minimization objective is achieved by regulating the current states to the desired with near-optimal control efforts. The proposed controller guarantees closed-loop stability and simulation results validate the proposed theory using IBVS and WMR examples.

Adaptive Actor-Critic Based Optimal Regulation for Drift-Free Uncertain Nonlinear Systems

TL;DR

This work tackles optimal regulation for drift-free nonlinear systems with unknown input gain matrices by formulating a continuous-time adaptive actor-critic (AAC) reinforcement learning controller. It employs concurrent learning to identify the constant parameter vector in while critic and actor NNs approximate the value function and the optimal policy , guided by the Bellman error . A Lyapunov-based analysis shows the closed-loop signals are uniformly ultimately bounded (UUB), with a finite-excitation condition ensuring parameter convergence and a sigma-modification safeguard when excitation is incomplete. Simulation studies on image-based visual servoing (IBVS) and wheeled mobile robots (WMR) validate near-optimal regulation, bounded weights, and convergence of parameter estimates, demonstrating practical applicability to robotics with uncertain input gain.

Abstract

In this paper, a continuous-time adaptive actor-critic reinforcement learning (RL) controller is developed for drift-free nonlinear systems. Practical examples of such systems are image-based visual servoing (IBVS) and wheeled mobile robots (WMR), where the system dynamics includes a parametric uncertainty in the control effectiveness matrix with no drift term. The uncertainty in the input term poses a challenge for developing a continuous-time RL controller using existing methods. In this paper, an actor-critic or synchronous policy iteration (PI)-based RL controller is presented with a concurrent learning (CL)-based parameter update law for estimating the unknown parameters of the control effectiveness matrix. An infinite-horizon value function minimization objective is achieved by regulating the current states to the desired with near-optimal control efforts. The proposed controller guarantees closed-loop stability and simulation results validate the proposed theory using IBVS and WMR examples.
Paper Structure (17 sections, 1 theorem, 45 equations, 4 figures)

This paper contains 17 sections, 1 theorem, 45 equations, 4 figures.

Key Result

Theorem 1

Given that the Assumptions 1-5 hold and the following sufficient condition is satisfied the actor-critic controller (eq:ApproxValueControl) along with the model parameter update law in (eq:thetaHatDot) and critic and actor weight update laws in (eq:wCHatUpdate)-(eq:GammaUpdate), (eq:wAHatUpdate) guarantee that the signals $\bar{x}(t)$, $\tilde{\theta}(t)$, $\tilde{W}_a(t)$ and $\tilde{

Figures (4)

  • Figure 1: IBVS: (a) Regulation errors, (b) Control velocities, (c) Value.
  • Figure 2: IBVS: (a) Parameter estimates, (b) Critic weights, (c) Actor weights.
  • Figure 3: WMR regulation: (a) Regulation errors, (b) Control velocities, (c) Value.
  • Figure 4: WMR regulation: (a) Parameter estimates along with true parameters, (b) Critic weights, (c) Actor weights.

Theorems & Definitions (9)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Theorem 1
  • proof
  • Remark 5
  • Remark 6
  • Remark 7