Table of Contents
Fetching ...

Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

Pavel Osinenko, Grigory Yaremenko, Roman Zashchitin, Anton Bolychev, Sinan Ibrahim, Dmitrii Dobriborsci

TL;DR

A novel reinforcement learning agent called Critic As Lyapunov Function (CALF) which is model-free and ensures online environment, in other words, dynamical system stabilization, which may be considered a viable approach to fusing classical control with reinforcement learning.

Abstract

This work presents and showcases a novel reinforcement learning agent called Critic As Lyapunov Function (CALF) which is model-free and ensures online environment, in other words, dynamical system stabilization. Online means that in each learning episode, the said environment is stabilized. This, as demonstrated in a case study with a mobile robot simulator, greatly improves the overall learning performance. The base actor-critic scheme of CALF is analogous to SARSA. The latter did not show any success in reaching the target in our studies. However, a modified version thereof, called SARSA-m here, did succeed in some learning scenarios. Still, CALF greatly outperformed the said approach. CALF was also demonstrated to improve a nominal stabilizer provided to it. In summary, the presented agent may be considered a viable approach to fusing classical control with reinforcement learning. Its concurrent approaches are mostly either offline or model-based, like, for instance, those that fuse model-predictive control into the agent.

Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

TL;DR

A novel reinforcement learning agent called Critic As Lyapunov Function (CALF) which is model-free and ensures online environment, in other words, dynamical system stabilization, which may be considered a viable approach to fusing classical control with reinforcement learning.

Abstract

This work presents and showcases a novel reinforcement learning agent called Critic As Lyapunov Function (CALF) which is model-free and ensures online environment, in other words, dynamical system stabilization. Online means that in each learning episode, the said environment is stabilized. This, as demonstrated in a case study with a mobile robot simulator, greatly improves the overall learning performance. The base actor-critic scheme of CALF is analogous to SARSA. The latter did not show any success in reaching the target in our studies. However, a modified version thereof, called SARSA-m here, did succeed in some learning scenarios. Still, CALF greatly outperformed the said approach. CALF was also demonstrated to improve a nominal stabilizer provided to it. In summary, the presented agent may be considered a viable approach to fusing classical control with reinforcement learning. Its concurrent approaches are mostly either offline or model-based, like, for instance, those that fuse model-predictive control into the agent.
Paper Structure (9 sections, 1 theorem, 39 equations, 3 figures, 1 algorithm)

This paper contains 9 sections, 1 theorem, 39 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

Consider the problem eqn_ocproblem and Algorithm alg_calfq. Let $\pi_t$ denote the policy generated by Algorithm alg_calfq. If the policy $\pi_{0}$ is a stabilizer, then $\pi_t$ is a stabilizer. If the former is a uniform stabilizer, then so is $\pi_t$.

Figures (3)

  • Figure 1: Robot kinematics and its frames of interests.
  • Figure 2: Learning curves obtained from 25 seeds of random number generator.
  • Figure 3: The robot trajectories in best learning episodes over 25 seeds.

Theorems & Definitions (6)

  • Definition 1
  • Remark 1
  • Theorem 1
  • Remark 2
  • Remark 3
  • Remark 4