Symmetry Considerations for Learning Task Symmetric Robot Policies

Mayank Mittal; Nikita Rudin; Victor Klemm; Arthur Allshire; Marco Hutter

Symmetry Considerations for Learning Task Symmetric Robot Policies

Mayank Mittal, Nikita Rudin, Victor Klemm, Arthur Allshire, Marco Hutter

TL;DR

This paper investigates two approaches to incorporate symmetry invariance into DRL -– data augmentation and mirror loss function and shows that the corresponding approach achieves faster convergence and improves the learned behaviors in various challenging robotic tasks, from climbing boxes with a quadruped to dexterous manipulation.

Abstract

Symmetry is a fundamental aspect of many real-world robotic tasks. However, current deep reinforcement learning (DRL) approaches can seldom harness and exploit symmetry effectively. Often, the learned behaviors fail to achieve the desired transformation invariances and suffer from motion artifacts. For instance, a quadruped may exhibit different gaits when commanded to move forward or backward, even though it is symmetrical about its torso. This issue becomes further pronounced in high-dimensional or complex environments, where DRL methods are prone to local optima and fail to explore regions of the state space equally. Past methods on encouraging symmetry for robotic tasks have studied this topic mainly in a single-task setting, where symmetry usually refers to symmetry in the motion, such as the gait patterns. In this paper, we revisit this topic for goal-conditioned tasks in robotics, where symmetry lies mainly in task execution and not necessarily in the learned motions themselves. In particular, we investigate two approaches to incorporate symmetry invariance into DRL -- data augmentation and mirror loss function. We provide a theoretical foundation for using augmented samples in an on-policy setting. Based on this, we show that the corresponding approach achieves faster convergence and improves the learned behaviors in various challenging robotic tasks, from climbing boxes with a quadruped to dexterous manipulation.

Symmetry Considerations for Learning Task Symmetric Robot Policies

TL;DR

Abstract

Paper Structure (18 sections, 5 equations, 7 figures, 1 table)

This paper contains 18 sections, 5 equations, 7 figures, 1 table.

Introduction
Related Work
Contributions
Preliminaries
Reinforcement Learning
MDP with Group Symmetries
Approaches for Symmetry in RL
Using Mirror Loss Function
Symmetry-Based Data Augmentation
Experiments and results
Tasks
Metrics
Training Performance
Effect of network initialization
Evaluation of Symmetry in Learned Behaviors
...and 3 more sections

Figures (7)

Figure 1: Motion and task symmetry for quadrupeds. While motion symmetry involves similar movements of the legs, it does not guarantee that the robot behaves the same when commanded different goals (walking forward and backward). In contrast, task symmetry ensures consistent behaviors for such goals, potentially resulting in periodic symmetric motions for walking on flat ground or entirely asymmetrical aperiodic patterns for tasks such as climbing a box.
Figure 2: The log action probabilities computed using baseline (Eq. \ref{['equ:policy-grad']}) and our proposed (Eq. \ref{['equ:sym-aug']}) approaches. We plot the mean obtained over the symmetry-augmented samples from each training iteration. The plot shows 5 runs with different seeds for the CartPole task. The baseline method leads to training instabilities caused by low action probabilities. Meanwhile, our approach maintains stable convergence for all runs.
Figure 3: We consider four robotic tasks: a continuous cart-pole, quadruped climbing a box, quadruped manipulating a cube, and in-hand cube reposing. In the table, we specify their state and action spaces along with the available symmetry transformations.
Figure 4: Comparison of different methods for the CartPole and ANYmal-Climb tasks -- vanilla PPO (baseline), PPO with symmetry augmentation (aug.), PPO with symmetry loss (loss-w), and a combination of the two. We plot the mean and standard deviation over three seeds. For the ANYmal-Climb task, we use a curriculum denoted as phases A, B, and C in the plot. We observe that symmetry augmentation yields the best performance consistently over all the tasks.
Figure 5: Effect of network initialization scales (init-n) for the CartPole task. We plot the mean and standard deviation over three seeds. Symmetry augmentation (aug.) struggles when initialized weights are high. Adding a small symmetry loss helps mitigate the issue but does not improve the performance.
...and 2 more figures

Symmetry Considerations for Learning Task Symmetric Robot Policies

TL;DR

Abstract

Symmetry Considerations for Learning Task Symmetric Robot Policies

Authors

TL;DR

Abstract

Table of Contents

Figures (7)