Deep Reinforcement Learning Graphs: Feedback Motion Planning via Neural Lyapunov Verification

Armin Ghanbarzadeh; Esmaeil Najafi

Deep Reinforcement Learning Graphs: Feedback Motion Planning via Neural Lyapunov Verification

Armin Ghanbarzadeh, Esmaeil Najafi

TL;DR

A feedback motion control algorithm that utilizes data-driven techniques and neural networks to solve the challenge of determining the region of attraction for reinforcement-learning based controllers.

Abstract

Recent advancements in model-free deep reinforcement learning have enabled efficient agent training. However, challenges arise when determining the region of attraction for these controllers, especially if the region does not fully cover the desired area. This paper addresses this issue by introducing a feedback motion control algorithm that utilizes data-driven techniques and neural networks. The algorithm constructs a graph of connected reinforcement-learning based controllers, each with its own defined region of attraction. This incremental approach effectively covers a bounded region of interest, creating a trajectory of interconnected nodes that guide the system from an initial state to the goal. Two approaches are presented for connecting nodes within the algorithm. The first is a tree-structured method, facilitating "point-to-point" control by constructing a tree connecting the initial state to the goal state. The second is a graph-structured method, enabling "space-to-space" control by building a graph within a bounded region. This approach allows for control from arbitrary initial and goal states. The proposed method's performance is evaluated on a first-order dynamic system, considering scenarios both with and without obstacles. The results demonstrate the effectiveness of the proposed algorithm in achieving the desired control objectives.

Deep Reinforcement Learning Graphs: Feedback Motion Planning via Neural Lyapunov Verification

TL;DR

Abstract

Paper Structure (14 sections, 12 equations, 8 figures, 4 tables, 3 algorithms)

This paper contains 14 sections, 12 equations, 8 figures, 4 tables, 3 algorithms.

Introduction
Background
Deep Reinforcement Learning
Lyapunov Neural Network
System Dynamics
Sequential RL controller with Neural Lyapunov certificate
TPC
Design of TPC
Result of TPC
GPC
Design of GPC
Result of GPC
Tree-Structured vs. Graph-Structured Controller
Conclusion

Figures (8)

Figure 1: Illustration of the sequential controllers path following akin to a ball descending through funnels. Each controller is only active outside the domains of lower controllers. The lowest controller stabilizes the system at the final destination. Each funnels preimage represents the RoA of the corresponding controller, adopted from burridge1999sequential
Figure 2: Dynamics used for simulating the proposed controller. The potential function $h$ is represented on the left side and its gradient that describes the vector field $f$ is depicted on the right side for the range $x_1 \in [-5, 5]$, $x_2 \in [-5, 5]$.
Figure 3: (a) Average cumulative episode reward observed during the training of the RL agent for the TPC. (b) Lyapunov Loss trends throughout the training of the Lyapunov neural networks for computing the region of attraction for each RL controller. The black line represents the average among the 23 controllers, while the shaded region illustrates the maximum and minimum bounds of the computed Rewards and Loss respectively.
Figure 4: The TPC applied to the dynamic system defined in (\ref{['eq:system']}). (a) Snapshot of the algorithm with 5 nodes. (b) Snapshot with 15 nodes. (c) Initial state reaching the covered space of the tree at 23 nodes, with irrelevant branches pruned. (d) Dynamic system trajectory following the constructed controller tree. Red cross: goal state $x_\text{goal} = [4, 4]^T$; Green cross: starting state $x_\text{start} = [-4, -4]^T$; Black square: bounded region $\mathcal{R}_\text{bound}$; Blue dots: controller node center locations $x_k$; Blue shaded region: individual node regions of attraction; Blue dashed shaded region: overall region of attraction of the tree ($\mathcal{R}_k$); Black lines: connections (edges) between tree nodes; Green line: system trajectory $x(t)$.
Figure 5: The TPC applied to the dynamic system defined in (\ref{['eq:system']}) with obstacle with radius 1 at $x_\text{obstacle} = [2, 0]^T$. (a) Snapshot of the algorithm with 5 nodes. (b) Snapshot with 11 nodes. (c) Initial state reaching the covered space of the tree at 18 nodes, with irrelevant branches pruned. (d) Dynamic system trajectory following the constructed controller tree. Red cross: goal state $x_\text{goal} = [4, 4]^T$; Green cross: starting state $x_\text{start} = [-4, -4]^T$; Black square: bounded region $\mathcal{R}_\text{bound}$; Red circle: obstacle; Black dashed circle: obstacle clearance; Blue dots: controller node center locations $x_k$; Blue shaded region: individual node regions of attraction; Blue dashed shaded region: overall region of attraction of the tree ($\mathcal{R}_k$); Black lines: connections (edges) between tree nodes; Green line: system trajectory $x(t)$.
...and 3 more figures

Deep Reinforcement Learning Graphs: Feedback Motion Planning via Neural Lyapunov Verification

TL;DR

Abstract

Deep Reinforcement Learning Graphs: Feedback Motion Planning via Neural Lyapunov Verification

Authors

TL;DR

Abstract

Table of Contents

Figures (8)