Table of Contents
Fetching ...

Safe Reinforcement Learning on the Constraint Manifold: Theory and Applications

Puze Liu, Haitham Bou-Ammar, Jan Peters, Davide Tateo

TL;DR

This article shows how it can impose complex safety constraints on learning-based robotics systems in a principled manner, both from theoretical and practical points of view, and demonstrates the method's effectiveness in a real-world robot air hockey task, and can handle high-dimensional tasks with complex constraints.

Abstract

Integrating learning-based techniques, especially reinforcement learning, into robotics is promising for solving complex problems in unstructured environments. However, most existing approaches are trained in well-tuned simulators and subsequently deployed on real robots without online fine-tuning. In this setting, extensive engineering is required to mitigate the sim-to-real gap, which can be challenging for complex systems. Instead, learning with real-world interaction data offers a promising alternative: it not only eliminates the need for a fine-tuned simulator but also applies to a broader range of tasks where accurate modeling is unfeasible. One major problem for on-robot reinforcement learning is ensuring safety, as uncontrolled exploration can cause catastrophic damage to the robot or the environment. Indeed, safety specifications, often represented as constraints, can be complex and non-linear, making safety challenging to guarantee in learning systems. In this paper, we show how we can impose complex safety constraints on learning-based robotics systems in a principled manner, both from theoretical and practical points of view. Our approach is based on the concept of the Constraint Manifold, representing the set of safe robot configurations. Exploiting differential geometry techniques, i.e., the tangent space, we can construct a safe action space, allowing learning agents to sample arbitrary actions while ensuring safety. We demonstrate the method's effectiveness in a real-world Robot Air Hockey task, showing that our method can handle high-dimensional tasks with complex constraints. Videos of the real robot experiments are available on the project website (https://puzeliu.github.io/TRO-ATACOM).

Safe Reinforcement Learning on the Constraint Manifold: Theory and Applications

TL;DR

This article shows how it can impose complex safety constraints on learning-based robotics systems in a principled manner, both from theoretical and practical points of view, and demonstrates the method's effectiveness in a real-world robot air hockey task, and can handle high-dimensional tasks with complex constraints.

Abstract

Integrating learning-based techniques, especially reinforcement learning, into robotics is promising for solving complex problems in unstructured environments. However, most existing approaches are trained in well-tuned simulators and subsequently deployed on real robots without online fine-tuning. In this setting, extensive engineering is required to mitigate the sim-to-real gap, which can be challenging for complex systems. Instead, learning with real-world interaction data offers a promising alternative: it not only eliminates the need for a fine-tuned simulator but also applies to a broader range of tasks where accurate modeling is unfeasible. One major problem for on-robot reinforcement learning is ensuring safety, as uncontrolled exploration can cause catastrophic damage to the robot or the environment. Indeed, safety specifications, often represented as constraints, can be complex and non-linear, making safety challenging to guarantee in learning systems. In this paper, we show how we can impose complex safety constraints on learning-based robotics systems in a principled manner, both from theoretical and practical points of view. Our approach is based on the concept of the Constraint Manifold, representing the set of safe robot configurations. Exploiting differential geometry techniques, i.e., the tangent space, we can construct a safe action space, allowing learning agents to sample arbitrary actions while ensuring safety. We demonstrate the method's effectiveness in a real-world Robot Air Hockey task, showing that our method can handle high-dimensional tasks with complex constraints. Videos of the real robot experiments are available on the project website (https://puzeliu.github.io/TRO-ATACOM).
Paper Structure (33 sections, 9 theorems, 56 equations, 12 figures, 6 tables, 2 algorithms)

This paper contains 33 sections, 9 theorems, 56 equations, 12 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

Let $\mathcal{M}$ and $\mathcal{N}$ be smooth manifolds, and let $\Phi: \mathcal{M} \rightarrow \mathcal{N}$ be a smooth map with constant rank $r$, then each level set of $\Phi$ is a properly embedded submanifold of codimension $r$ in $\mathcal{M}$.

Figures (12)

  • Figure 1: The robot air hockey task. The objective is to strike the puck to the opponent's goal. The vector field shows the velocity of the end-effector at different locations when a positive unit action is applied in the first two dimensions of the safety action space using . The blue (resp. red) arrow corresponds to a unit action applied in the first (resp. second) dimension.
  • Figure 2: Conceptual illustration of . (a) The safe set is defined by the constraint in the original state space. (b) Construct the constraint manifold in the augmented state space. (c) Determine the tangent space for each point on the constraint manifold. (d) The trajectory moves on the tangent space resulting in a safe trajectory projected in the original state space.
  • Figure 3: Illustration of the Constraint Manifolds, Singular Set, and the Region of Attraction in Example \ref{['example:singular_set']}. The thick blue lines depict the constraint manifold, and the red points are singular points. The contraction term shrinks to zero at the singular point, indicating a saddle point of the Lyapunov function $V$. The region of attraction $\Omega_\eta$ excludes the singular set, as the blue shaded area shown in the figure.
  • Figure 4: Comparison of trajectories with different slack function $\alpha(\bm{\mu})$ in a 2D Environment. The grey area renders an obstacle in a 2D environment. The constraint is defined as $\Vert \bm{s} - {\bm{p}}_{o} \Vert> 0$. The system is controlled by velocity $\dot{\bm{s}} = {\bm{u}}_s$. The curves show the trajectories starting from different initial points with a constant control input ${\bm{u}}_s = [1 \quad 0]^\intercal$. The upper half shows the trajectories with the exponential slack dynamics (E) and linear ones (L). The lower half shows trajectory with different $\beta$ parameters using exponential slack dynamics.
  • Figure 5: Comparison of Nonsmooth (N) and Smooth (S) basis using the Linear (LIN) and the Exponential (EXP) slack dynamics function for two different constraints. Top Row: $-(s_1^2 + s_2^2) + 1 \leq 0$, Bottom Row: $\cos(4s_1)+s_2^2 - 0.8 \leq 0$. (a) 3D manifold with linear slack dynamics, the tangent space bases are obtained from QR decomposition. The tangent space bases do not vary smoothly. (b) The tangent basis project onto the original $S_1-S_2$ space. (c) Projected smooth tangent space bases with linear slack dynamics computed by Alg. \ref{['alg:tangent_basis']}. The projected tangent space is not orthogonal in the projected space. (d) Projected smooth tangent space bases with exponential slack dynamics. The tangent space bases are less deformed when the state is away from the boundary. (e) The smooth tangent space bases in $S_1-S_2-\mu$ space.
  • ...and 7 more figures

Theorems & Definitions (30)

  • Theorem 1: Constant-Rank Level Set Theorem
  • Definition 1
  • Theorem 2: LaSalle’s Invariance Principle
  • Remark 1
  • Proposition 1
  • proof
  • Example 1
  • Lemma 1
  • proof
  • Remark 2
  • ...and 20 more