Table of Contents
Fetching ...

Reinforcement Learning with Euclidean Data Augmentation for State-Based Continuous Control

Jinzhu Luo, Dingyang Chen, Qi Zhang

TL;DR

This work considers an alternative data augmentation applied to state-based control features based on Euclidean symmetries under transformations like rotations, and shows this new Euclidean data augmentation strategy significantly improves both data efficiency and asymptotic performance of RL on a wide range of continuous control tasks.

Abstract

Data augmentation creates new data points by transforming the original ones for a reinforcement learning (RL) agent to learn from, which has been shown to be effective for the objective of improving the data efficiency of RL for continuous control. Prior work towards this objective has been largely restricted to perturbation-based data augmentation where new data points are created by perturbing the original ones, which has been impressively effective for tasks where the RL agent observes control states as images with perturbations including random cropping, shifting, etc. This work focuses on state-based control, where the RL agent can directly observe raw kinematic and task features, and considers an alternative data augmentation applied to these features based on Euclidean symmetries under transformations like rotations. We show that the default state features used in exiting benchmark tasks that are based on joint configurations are not amenable to Euclidean transformations. We therefore advocate using state features based on configurations of the limbs (i.e., the rigid bodies connected by the joints) that instead provide rich augmented data under Euclidean transformations. With minimal hyperparameter tuning, we show this new Euclidean data augmentation strategy significantly improves both data efficiency and asymptotic performance of RL on a wide range of continuous control tasks.

Reinforcement Learning with Euclidean Data Augmentation for State-Based Continuous Control

TL;DR

This work considers an alternative data augmentation applied to state-based control features based on Euclidean symmetries under transformations like rotations, and shows this new Euclidean data augmentation strategy significantly improves both data efficiency and asymptotic performance of RL on a wide range of continuous control tasks.

Abstract

Data augmentation creates new data points by transforming the original ones for a reinforcement learning (RL) agent to learn from, which has been shown to be effective for the objective of improving the data efficiency of RL for continuous control. Prior work towards this objective has been largely restricted to perturbation-based data augmentation where new data points are created by perturbing the original ones, which has been impressively effective for tasks where the RL agent observes control states as images with perturbations including random cropping, shifting, etc. This work focuses on state-based control, where the RL agent can directly observe raw kinematic and task features, and considers an alternative data augmentation applied to these features based on Euclidean symmetries under transformations like rotations. We show that the default state features used in exiting benchmark tasks that are based on joint configurations are not amenable to Euclidean transformations. We therefore advocate using state features based on configurations of the limbs (i.e., the rigid bodies connected by the joints) that instead provide rich augmented data under Euclidean transformations. With minimal hyperparameter tuning, we show this new Euclidean data augmentation strategy significantly improves both data efficiency and asymptotic performance of RL on a wide range of continuous control tasks.

Paper Structure

This paper contains 36 sections, 2 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Illustration of Cheetah from DMControl, including its rendering, tree-structured morphology with the nodes being the limbs and the edges the joints, and the state features.
  • Figure 2: ${\rm SO}_{\vec{\bm{g}}}(3)$ rotation in Cheetah.
  • Figure 3: Learning curves comparing data efficiency of our method again all baselines except for SEGNN on 9 out of the 10 tasks, excluding the task of Reacher_hard. The results involving SEGNN and Reacher_hard are deferred to Figure \ref{['fig:online_main_SEGNN']}.
  • Figure 4: Learning curves of data efficiency ( top) and run time for 1M steps in total ( bottom) for our method and all baselines on Reacher_hard.
  • Figure 5: Learning curves of our method on the effect of $\rho_{\rm aug}$ on three sample tasks.
  • ...and 3 more figures