Reinforcement Learning with Lie Group Orientations for Robotics
Martin Schuck, Jan Brüdigam, Sandra Hirche, Angela Schoellig
TL;DR
The paper addresses the mismatch between Euclidean neural networks and the non-Euclidean structure of robot orientations by introducing a Lie-algebra–based input/output scheme. It uses $\tau_s=\Log(s)$ as network input and outputs $\tau_a=\Log(a)$ with $a=\Exp(\tau_a)$, updating states via $s'=s\cdot a$, thereby staying on the orientation manifold throughout learning. A thorough empirical study across 36 state/action representation pairs in three robotics tasks demonstrates that Lie-algebra actions generally yield faster training and competitive policy performance compared to standard representations, with notable computational efficiency advantages. The approach is practical, library-friendly, and broadly applicable to orientation-controlled RL in robotics, offering a principled way to respect the underlying geometry while leveraging existing deep-learning tools.
Abstract
Handling orientations of robots and objects is a crucial aspect of many applications. Yet, ever so often, there is a lack of mathematical correctness when dealing with orientations, especially in learning pipelines involving, for example, artificial neural networks. In this paper, we investigate reinforcement learning with orientations and propose a simple modification of the network's input and output that adheres to the Lie group structure of orientations. As a result, we obtain an easy and efficient implementation that is directly usable with existing learning libraries and achieves significantly better performance than other common orientation representations. We briefly introduce Lie theory specifically for orientations in robotics to motivate and outline our approach. Subsequently, a thorough empirical evaluation of different combinations of orientation representations for states and actions demonstrates the superior performance of our proposed approach in different scenarios, including: direct orientation control, end effector orientation control, and pick-and-place tasks.
