Neural Lyapunov and Optimal Control

Daniel Layeghi; Steve Tonneau; Michael Mistry

Neural Lyapunov and Optimal Control

Daniel Layeghi, Steve Tonneau, Michael Mistry

TL;DR

This paper uses the Hamilton-Jacobi-Bellman (HJB) and first-order gradients to learn optimal time-varying value functions and therefore, policies and introduces an optimal control theoretic learning-based method that can solve the same problems robustly with simple parsimonious costs.

Abstract

Despite impressive results, reinforcement learning (RL) suffers from slow convergence and requires a large variety of tuning strategies. In this paper, we investigate the ability of RL algorithms on simple continuous control tasks. We show that without reward and environment tuning, RL suffers from poor convergence. In turn, we introduce an optimal control (OC) theoretic learning-based method that can solve the same problems robustly with simple parsimonious costs. We use the Hamilton-Jacobi-Bellman (HJB) and first-order gradients to learn optimal time-varying value functions and therefore, policies. We show the relaxation of our objective results in time-varying Lyapunov functions, further verifying our approach by providing guarantees over a compact set of initial conditions. We compare our method to Soft Actor Critic (SAC) and Proximal Policy Optimisation (PPO). In this comparison, we solve all tasks, we never underperform in task cost and we show that at the point of our convergence, we outperform SAC and PPO in the best case by 4 and 2 orders of magnitude.

Neural Lyapunov and Optimal Control

TL;DR

Abstract

Paper Structure (22 sections, 13 equations, 3 figures, 1 table)

This paper contains 22 sections, 13 equations, 3 figures, 1 table.

Introduction
Related Work
RL robustness
OC theoretic policy/value learning
Preliminaries
Optimal Control:
Neural ODEs
Learning Lyapunov and value functions
Value functions
Lyapunov functions
Neural ODE
Empirical results
Criteria
Environments and solver setting
Value Function Results
...and 7 more sections

Figures (3)

Figure 1: Compact stability region for double integrator, computed by Neural Lyapunov Control.
Figure 2: Top row: Constraint satisfaction loss for value and Lyapunov function constraints. Middle row: Trajectory cost using our method. Bottom row: SAC and PPO trajectory cost. Due to high values, SAC costs are scaled for visualisation.
Figure 3: Cartpole balancing Lyapunov trajectories.

Neural Lyapunov and Optimal Control

TL;DR

Abstract

Neural Lyapunov and Optimal Control

Authors

TL;DR

Abstract

Table of Contents

Figures (3)