Adaptive Actor-Critic Based Optimal Regulation for Drift-Free Uncertain Nonlinear Systems
Ashwin P. Dani, Shubhendu Bhasin
TL;DR
This work tackles optimal regulation for drift-free nonlinear systems with unknown input gain matrices $g(x,\theta)$ by formulating a continuous-time adaptive actor-critic (AAC) reinforcement learning controller. It employs concurrent learning to identify the constant parameter vector $\theta$ in $g(x,\theta)$ while critic and actor NNs approximate the value function $V^*(\bar{x})$ and the optimal policy $u^*$, guided by the Bellman error $\delta$. A Lyapunov-based analysis shows the closed-loop signals are uniformly ultimately bounded (UUB), with a finite-excitation condition ensuring parameter convergence and a sigma-modification safeguard when excitation is incomplete. Simulation studies on image-based visual servoing (IBVS) and wheeled mobile robots (WMR) validate near-optimal regulation, bounded weights, and convergence of parameter estimates, demonstrating practical applicability to robotics with uncertain input gain.
Abstract
In this paper, a continuous-time adaptive actor-critic reinforcement learning (RL) controller is developed for drift-free nonlinear systems. Practical examples of such systems are image-based visual servoing (IBVS) and wheeled mobile robots (WMR), where the system dynamics includes a parametric uncertainty in the control effectiveness matrix with no drift term. The uncertainty in the input term poses a challenge for developing a continuous-time RL controller using existing methods. In this paper, an actor-critic or synchronous policy iteration (PI)-based RL controller is presented with a concurrent learning (CL)-based parameter update law for estimating the unknown parameters of the control effectiveness matrix. An infinite-horizon value function minimization objective is achieved by regulating the current states to the desired with near-optimal control efforts. The proposed controller guarantees closed-loop stability and simulation results validate the proposed theory using IBVS and WMR examples.
