Table of Contents
Fetching ...

A novel agent with formal goal-reaching guarantees: an experimental study with a mobile robot

Grigory Yaremenko, Dmitrii Dobriborsci, Roman Zashchitin, Ruben Contreras Maestre, Ngoc Quoc Huy Hoang, Pavel Osinenko

TL;DR

This work presents a novel safe model-free RL agent called Critic As Lyapunov Function (CALF) and showcases how CALF can be used to improve upon control baselines in robotics in an efficient and convenient fashion while ensuring guarantees of stable goal reaching.

Abstract

Reinforcement Learning (RL) has been shown to be effective and convenient for a number of tasks in robotics. However, it requires the exploration of a sufficiently large number of state-action pairs, many of which may be unsafe or unimportant. For instance, online model-free learning can be hazardous and inefficient in the absence of guarantees that a certain set of desired states will be reached during an episode. An increasingly common approach to address safety involves the addition of a shielding system that constrains the RL actions to a safe set of actions. In turn, a difficulty for such frameworks is how to effectively couple RL with the shielding system to make sure the exploration is not excessively restricted. This work presents a novel safe model-free RL agent called Critic As Lyapunov Function (CALF) and showcases how CALF can be used to improve upon control baselines in robotics in an efficient and convenient fashion while ensuring guarantees of stable goal reaching. The latter is a crucial part of safety, as seen generally. With CALF all state-action pairs remain explorable and yet reaching of desired goal states is formally guaranteed. Formal analysis is provided that shows the goal stabilization-ensuring properties of CALF and a set of real-world and numerical experiments with a non-holonomic wheeled mobile robot (WMR) TurtleBot3 Burger confirmed the superiority of CALF over such a well-established RL agent as proximal policy optimization (PPO), and a modified version of SARSA in a few-episode setting in terms of attained total cost.

A novel agent with formal goal-reaching guarantees: an experimental study with a mobile robot

TL;DR

This work presents a novel safe model-free RL agent called Critic As Lyapunov Function (CALF) and showcases how CALF can be used to improve upon control baselines in robotics in an efficient and convenient fashion while ensuring guarantees of stable goal reaching.

Abstract

Reinforcement Learning (RL) has been shown to be effective and convenient for a number of tasks in robotics. However, it requires the exploration of a sufficiently large number of state-action pairs, many of which may be unsafe or unimportant. For instance, online model-free learning can be hazardous and inefficient in the absence of guarantees that a certain set of desired states will be reached during an episode. An increasingly common approach to address safety involves the addition of a shielding system that constrains the RL actions to a safe set of actions. In turn, a difficulty for such frameworks is how to effectively couple RL with the shielding system to make sure the exploration is not excessively restricted. This work presents a novel safe model-free RL agent called Critic As Lyapunov Function (CALF) and showcases how CALF can be used to improve upon control baselines in robotics in an efficient and convenient fashion while ensuring guarantees of stable goal reaching. The latter is a crucial part of safety, as seen generally. With CALF all state-action pairs remain explorable and yet reaching of desired goal states is formally guaranteed. Formal analysis is provided that shows the goal stabilization-ensuring properties of CALF and a set of real-world and numerical experiments with a non-holonomic wheeled mobile robot (WMR) TurtleBot3 Burger confirmed the superiority of CALF over such a well-established RL agent as proximal policy optimization (PPO), and a modified version of SARSA in a few-episode setting in terms of attained total cost.
Paper Structure (15 sections, 1 theorem, 19 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 1 theorem, 19 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Consider the problem eqn_value under the MDP eqn_mdp. Let $\pi^0 \in \Pi_0$ have the following goal reaching property for $\mathbb{G} \subset \mathbb S$ , i. e., Let $\pi_t$ be produced by Algorithm 1 for all $t \ge 0$. Then, a similar goal reaching property is preserved under $\pi_t$ , i. e.,

Figures (4)

  • Figure 1: WMR kinematics and its frames of interests.
  • Figure 2: Real-world runs reproducing best episodes of respective algorithms (overlayed). The full footage can be found via the following link: https://youtu.be/RgiDHzE5-w8?si=Kig6bNl8Cd7dTrzP.
  • Figure 3: Learning curves obtained from 20 seeds [1..20].
  • Figure 4: Trajectories in best episodes over 20 random seeds. The succeess rate of goal reaching was 100 % of episodes by CALF, as expected, not more than 30 % by PPO, and about 50 % by SARSA-m.

Theorems & Definitions (3)

  • Definition 1
  • Remark 1
  • Theorem 1