Table of Contents
Fetching ...

Reinforcement Learning with Lie Group Orientations for Robotics

Martin Schuck, Jan Brüdigam, Sandra Hirche, Angela Schoellig

TL;DR

The paper addresses the mismatch between Euclidean neural networks and the non-Euclidean structure of robot orientations by introducing a Lie-algebra–based input/output scheme. It uses $\tau_s=\Log(s)$ as network input and outputs $\tau_a=\Log(a)$ with $a=\Exp(\tau_a)$, updating states via $s'=s\cdot a$, thereby staying on the orientation manifold throughout learning. A thorough empirical study across 36 state/action representation pairs in three robotics tasks demonstrates that Lie-algebra actions generally yield faster training and competitive policy performance compared to standard representations, with notable computational efficiency advantages. The approach is practical, library-friendly, and broadly applicable to orientation-controlled RL in robotics, offering a principled way to respect the underlying geometry while leveraging existing deep-learning tools.

Abstract

Handling orientations of robots and objects is a crucial aspect of many applications. Yet, ever so often, there is a lack of mathematical correctness when dealing with orientations, especially in learning pipelines involving, for example, artificial neural networks. In this paper, we investigate reinforcement learning with orientations and propose a simple modification of the network's input and output that adheres to the Lie group structure of orientations. As a result, we obtain an easy and efficient implementation that is directly usable with existing learning libraries and achieves significantly better performance than other common orientation representations. We briefly introduce Lie theory specifically for orientations in robotics to motivate and outline our approach. Subsequently, a thorough empirical evaluation of different combinations of orientation representations for states and actions demonstrates the superior performance of our proposed approach in different scenarios, including: direct orientation control, end effector orientation control, and pick-and-place tasks.

Reinforcement Learning with Lie Group Orientations for Robotics

TL;DR

The paper addresses the mismatch between Euclidean neural networks and the non-Euclidean structure of robot orientations by introducing a Lie-algebra–based input/output scheme. It uses as network input and outputs with , updating states via , thereby staying on the orientation manifold throughout learning. A thorough empirical study across 36 state/action representation pairs in three robotics tasks demonstrates that Lie-algebra actions generally yield faster training and competitive policy performance compared to standard representations, with notable computational efficiency advantages. The approach is practical, library-friendly, and broadly applicable to orientation-controlled RL in robotics, offering a principled way to respect the underlying geometry while leveraging existing deep-learning tools.

Abstract

Handling orientations of robots and objects is a crucial aspect of many applications. Yet, ever so often, there is a lack of mathematical correctness when dealing with orientations, especially in learning pipelines involving, for example, artificial neural networks. In this paper, we investigate reinforcement learning with orientations and propose a simple modification of the network's input and output that adheres to the Lie group structure of orientations. As a result, we obtain an easy and efficient implementation that is directly usable with existing learning libraries and achieves significantly better performance than other common orientation representations. We briefly introduce Lie theory specifically for orientations in robotics to motivate and outline our approach. Subsequently, a thorough empirical evaluation of different combinations of orientation representations for states and actions demonstrates the superior performance of our proposed approach in different scenarios, including: direct orientation control, end effector orientation control, and pick-and-place tasks.
Paper Structure (13 sections, 9 equations, 6 figures, 4 tables)

This paper contains 13 sections, 9 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Top: Our proposed learning architecture. Starting with an orientation state $\boldsymbol{s}\in\mathcal{M}$, we take the $\mathop{\mathrm{Log}}\limits$ to obtain a vector $\prescript{\boldsymbol{\mathcal{E}}}{}{\boldsymbol{\tau}}_{\boldsymbol{s}}\in\mathbb{R}^3$ in the tangent space. This vector is passed into the neural network to obtain an action vector $\prescript{\boldsymbol{s}}{}{\boldsymbol{\tau}}_{\boldsymbol{a}}\in\mathbb{R}^3$ relative to $\boldsymbol{s}$. By taking the $\mathop{\mathrm{Exp}}\limits$, we obtain a relative action $\boldsymbol{a}\in\mathcal{M}$ which is composed with the original state $\boldsymbol{s}$ to obtain the new state $\boldsymbol{s}'=\boldsymbol{s}\cdot\boldsymbol{a}\in\mathcal{M}$. Bottom: The hardware setup for the pick-and-place task. A cube is moved from an initial pose to a goal pose in the air.
  • Figure 2: A comparison of continuous and discontinuous orientation representations. Top: the initial frame (1) is first turned $-90^\circ$ around the local $y$-axis to obtain frame (2), and then turned $90^\circ$ around the local $z$-axis to obtain frame (3). Bottom: Representing this frame transformation with a rotation matrix or a quaternion (with double cover) evolves continuously on their respective manifolds. Using Euler angles results in a discontinuity at the singularity.
  • Figure 3: Visualization of a Lie group on manifold $\mathcal{M}$, its Lie algebra $\mathfrak{m}$, i.e., the tangent space at the identity $T_{\boldsymbol{\mathcal{E}}}\mathcal{M}$, and a tangent space at $T_{\boldsymbol{x}}\mathcal{M}$ at $\boldsymbol{x}\in\mathcal{M}$, where the following holds: $\boldsymbol{y} = \boldsymbol{x}\cdot\mathop{\mathrm{Exp}}\limits(\prescript{\boldsymbol{x}}{}{\boldsymbol{\tau}})$.
  • Figure 4: Comparison of different orientation representations for state and action: Lie algebra $\mathfrak{m}$, rotation matrices $SO(3)$ ($SO_3$), two-column rotation matrices $SO_{1:2}(3)$ ($SO_3^{1:2}$), positive-real-part quaternions $\mathcal{S}^{3+}$, quaternions $\mathcal{S}^3$, Euler angles $\measuredangle$, and Riemannian manifold action RM. Top row: Results for direct orientation control. Bottom row: Results for end effector orientation control. Left column: Average success rate during training to measure convergence speed and overall success. Higher (blue) is better. Center column: Final success rate to measure best task success. Higher (blue) is better. Right column: Average reward per step of the final policy to measure policy performance. Closer to zero (blue) is better.
  • Figure 5: Task progression for direct orientation control. From the initial state $\boldsymbol{s}_0$, relative rotation actions $\boldsymbol{a}_i$ are taken to move toward the goal (not shown).
  • ...and 1 more figures