Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

Pavel Osinenko; Grigory Yaremenko; Roman Zashchitin; Anton Bolychev; Sinan Ibrahim; Dmitrii Dobriborsci

Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

Pavel Osinenko, Grigory Yaremenko, Roman Zashchitin, Anton Bolychev, Sinan Ibrahim, Dmitrii Dobriborsci

TL;DR

A novel reinforcement learning agent called Critic As Lyapunov Function (CALF) which is model-free and ensures online environment, in other words, dynamical system stabilization, which may be considered a viable approach to fusing classical control with reinforcement learning.

Abstract

This work presents and showcases a novel reinforcement learning agent called Critic As Lyapunov Function (CALF) which is model-free and ensures online environment, in other words, dynamical system stabilization. Online means that in each learning episode, the said environment is stabilized. This, as demonstrated in a case study with a mobile robot simulator, greatly improves the overall learning performance. The base actor-critic scheme of CALF is analogous to SARSA. The latter did not show any success in reaching the target in our studies. However, a modified version thereof, called SARSA-m here, did succeed in some learning scenarios. Still, CALF greatly outperformed the said approach. CALF was also demonstrated to improve a nominal stabilizer provided to it. In summary, the presented agent may be considered a viable approach to fusing classical control with reinforcement learning. Its concurrent approaches are mostly either offline or model-based, like, for instance, those that fuse model-predictive control into the agent.

Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

TL;DR

Abstract

Paper Structure (9 sections, 1 theorem, 39 equations, 3 figures, 1 algorithm)

This paper contains 9 sections, 1 theorem, 39 equations, 3 figures, 1 algorithm.

Background and problem statement
Problem statement
Related work
Approach
Modified SARSA
Analysis
Case study
System description
Discussion of results and conclusion

Key Result

Theorem 1

Consider the problem eqn_ocproblem and Algorithm alg_calfq. Let $\pi_t$ denote the policy generated by Algorithm alg_calfq. If the policy $\pi_{0}$ is a stabilizer, then $\pi_t$ is a stabilizer. If the former is a uniform stabilizer, then so is $\pi_t$.

Figures (3)

Figure 1: Robot kinematics and its frames of interests.
Figure 2: Learning curves obtained from 25 seeds of random number generator.
Figure 3: The robot trajectories in best learning episodes over 25 seeds.

Theorems & Definitions (6)

Definition 1
Remark 1
Theorem 1
Remark 2
Remark 3
Remark 4

Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

TL;DR

Abstract

Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (6)