Table of Contents
Fetching ...

Decomposing Control Lyapunov Functions for Efficient Reinforcement Learning

Antonio Lopez, David Fridovich-Keil

TL;DR

A control-theoretic approach to improve sample efficiency in RL by developing a new variety of CLF-like functions, which are term Decomposed Control Lyapunov Functions (DCLFs), which are shown to be more readily computable in higher-dimensional cases via a system decomposition technique.

Abstract

Recent methods using Reinforcement Learning (RL) have proven to be successful for training intelligent agents in unknown environments. However, RL has not been applied widely in real-world robotics scenarios. This is because current state-of-the-art RL methods require large amounts of data to learn a specific task, leading to unreasonable costs when deploying the agent to collect data in real-world applications. In this paper, we build from existing work that reshapes the reward function in RL by introducing a Control Lyapunov Function (CLF), which is demonstrated to reduce the sample complexity. Still, this formulation requires knowing a CLF of the system, but due to the lack of a general method, it is often a challenge to identify a suitable CLF. Existing work can compute low-dimensional CLFs via a Hamilton-Jacobi reachability procedure. However, this class of methods becomes intractable on high-dimensional systems, a problem that we address by using a system decomposition technique to compute what we call Decomposed Control Lyapunov Functions (DCLFs). We use the computed DCLF for reward shaping, which we show improves RL performance. Through multiple examples, we demonstrate the effectiveness of this approach, where our method finds a policy to successfully land a quadcopter in less than half the amount of real-world data required by the state-of-the-art Soft-Actor Critic algorithm.

Decomposing Control Lyapunov Functions for Efficient Reinforcement Learning

TL;DR

A control-theoretic approach to improve sample efficiency in RL by developing a new variety of CLF-like functions, which are term Decomposed Control Lyapunov Functions (DCLFs), which are shown to be more readily computable in higher-dimensional cases via a system decomposition technique.

Abstract

Recent methods using Reinforcement Learning (RL) have proven to be successful for training intelligent agents in unknown environments. However, RL has not been applied widely in real-world robotics scenarios. This is because current state-of-the-art RL methods require large amounts of data to learn a specific task, leading to unreasonable costs when deploying the agent to collect data in real-world applications. In this paper, we build from existing work that reshapes the reward function in RL by introducing a Control Lyapunov Function (CLF), which is demonstrated to reduce the sample complexity. Still, this formulation requires knowing a CLF of the system, but due to the lack of a general method, it is often a challenge to identify a suitable CLF. Existing work can compute low-dimensional CLFs via a Hamilton-Jacobi reachability procedure. However, this class of methods becomes intractable on high-dimensional systems, a problem that we address by using a system decomposition technique to compute what we call Decomposed Control Lyapunov Functions (DCLFs). We use the computed DCLF for reward shaping, which we show improves RL performance. Through multiple examples, we demonstrate the effectiveness of this approach, where our method finds a policy to successfully land a quadcopter in less than half the amount of real-world data required by the state-of-the-art Soft-Actor Critic algorithm.
Paper Structure (15 sections, 20 equations, 5 figures, 2 tables)

This paper contains 15 sections, 20 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Schematic of our framework. We take a dynamical system model of an autonomous robot and decompose it into several subsystems. We compute a Control Lyapunov-Value Function (CLVF) for each subsystem and take the sum of these CLVFs as our Decomposed Control Lyapunov Function, which we show can be incorporated as reward shaping to accelerate policy learning in a variety of low- and high-dimensional tasks.
  • Figure 2: Learning curves of different RL algorithms run on a Dubins Car simulation. Approaches incorporating Lyapunov functions, SAC+DCLF (ours) and SAC+CLVF, perform better than the SAC baseline. Each epoch consists of 300 simulation steps or 3 seconds of data. Four different seeds were used in the simulations.
  • Figure 3: Dubins Car after 2 minutes of trajectory data. (Left): Trajectory using our approach with the incorporated DCLF. (Right): Trajectory using standard SAC algorithm.
  • Figure 4: Learning curves of different RL algorithms run on the Lunar Lander environment. Five different seeds were used in the simulations.
  • Figure 5: Learning curves of different RL algorithms run on the Drone experiment. Each epoch consists of 3 seconds of data. Four different seeds were used in the simulations.

Theorems & Definitions (1)

  • proof