Table of Contents
Fetching ...

Neural Lyapunov Function Approximation with Self-Supervised Reinforcement Learning

Luc McCutcheon, Bahman Gharesifard, Saber Fallah

TL;DR

This paper tackles the challenge of deriving valid Lyapunov functions for nonlinear systems by introducing SACLA, a self-supervised reinforcement learning framework that jointly learns a neural Lyapunov function (NLF), a probabilistic World Model, and a goal-conditioned policy. By augmenting the Soft Actor-Critic objective with a Lyapunov risk term, SACLA encourages exploration into unstable regions, expanding the region of attraction while maintaining stability through an offline, off-policy data-efficient training loop. The method extends Almost Lyapunov Critics to a data-driven, off-policy, goal-conditioned setting and demonstrates improved ROA and Lyapunov function accuracy on standard robotic tasks, with comprehensive stability analyses facilitated by the World Model. The proposed approach offers a scalable, data-efficient pathway to stable controller learning in highly nonlinear systems, with potential implications for safety-critical applications like autonomous robotics and aerospace systems, where robust stability guarantees are valuable.

Abstract

Control Lyapunov functions are traditionally used to design a controller which ensures convergence to a desired state, yet deriving these functions for nonlinear systems remains a complex challenge. This paper presents a novel, sample-efficient method for neural approximation of nonlinear Lyapunov functions, leveraging self-supervised Reinforcement Learning (RL) to enhance training data generation, particularly for inaccurately represented regions of the state space. The proposed approach employs a data-driven World Model to train Lyapunov functions from off-policy trajectories. The method is validated on both standard and goal-conditioned robotic tasks, demonstrating faster convergence and higher approximation accuracy compared to the state-of-the-art neural Lyapunov approximation baseline. The code is available at: https://github.com/CAV-Research-Lab/SACLA.git

Neural Lyapunov Function Approximation with Self-Supervised Reinforcement Learning

TL;DR

This paper tackles the challenge of deriving valid Lyapunov functions for nonlinear systems by introducing SACLA, a self-supervised reinforcement learning framework that jointly learns a neural Lyapunov function (NLF), a probabilistic World Model, and a goal-conditioned policy. By augmenting the Soft Actor-Critic objective with a Lyapunov risk term, SACLA encourages exploration into unstable regions, expanding the region of attraction while maintaining stability through an offline, off-policy data-efficient training loop. The method extends Almost Lyapunov Critics to a data-driven, off-policy, goal-conditioned setting and demonstrates improved ROA and Lyapunov function accuracy on standard robotic tasks, with comprehensive stability analyses facilitated by the World Model. The proposed approach offers a scalable, data-efficient pathway to stable controller learning in highly nonlinear systems, with potential implications for safety-critical applications like autonomous robotics and aerospace systems, where robust stability guarantees are valuable.

Abstract

Control Lyapunov functions are traditionally used to design a controller which ensures convergence to a desired state, yet deriving these functions for nonlinear systems remains a complex challenge. This paper presents a novel, sample-efficient method for neural approximation of nonlinear Lyapunov functions, leveraging self-supervised Reinforcement Learning (RL) to enhance training data generation, particularly for inaccurately represented regions of the state space. The proposed approach employs a data-driven World Model to train Lyapunov functions from off-policy trajectories. The method is validated on both standard and goal-conditioned robotic tasks, demonstrating faster convergence and higher approximation accuracy compared to the state-of-the-art neural Lyapunov approximation baseline. The code is available at: https://github.com/CAV-Research-Lab/SACLA.git

Paper Structure

This paper contains 15 sections, 13 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: ROA for SACLA in the FetchReach-v2 environment for 512 points within 2 global coordinates of the goal location with the final performance percentage and standard deviation. Arrows indicate the action vector taken at each point. Blue arrows have negative Lie derivatives where as red have positive Lie derivatives
  • Figure 2: Phase plot of angular displacement ($\theta$) vs angular velocity ($\dot{\theta}$) for SACLA in the InvertedPendulum-v4 environment with the final performance percentage and standard deviation. Blue points represent negative Lie derivatives and red points indicate positive Lie derivatives; the intensity of the color indicates the magnitude of the Lie derivative.
  • Figure 3: Percentage of negative Lie derivatives during the training process for different objective functions, with error bars indicating standard deviation over 15 test seeds
  • Figure 4: SACLA ($\beta=0.5$) Lyapunov value distribution over time with an example trajectory on the surface $M$ where $N=100$ for the FetchReach-v2 environment.

Theorems & Definitions (3)

  • Definition 1: $\epsilon$-stability
  • Definition 2: Lyapunov Risk
  • Remark 3