Table of Contents
Fetching ...

Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning

Yunpeng Jiang, Jianshu Hu, Paul Weng, Yutong Ban

TL;DR

This work tackles low sample efficiency in deep reinforcement learning for robotic manipulation by introducing Time Reversal symmetry enhanced DRL (TR-DRL). It formalizes partial time reversal (PTR) symmetry and combines two complementary components: trajectory reversal augmentation with a dynamics-aware filter to exploit fully reversible transitions, and time reversal symmetry guided reward shaping to leverage partially reversible state components. The method learns inverse and forward dynamics to validate reversed transitions and uses potential-based shaping trained on successful reversed trajectories to guide policy learning. Empirical results on Robosuite and MetaWorld demonstrate superior sample efficiency and stronger final performance, with ablations confirming the value of each component. The approach broadens symmetry-based DRL by incorporating temporal structure, enabling more data-efficient learning in temporally symmetric manipulation tasks.

Abstract

Symmetry is pervasive in robotics and has been widely exploited to improve sample efficiency in deep reinforcement learning (DRL). However, existing approaches primarily focus on spatial symmetries, such as reflection, rotation, and translation, while largely neglecting temporal symmetries. To address this gap, we explore time reversal symmetry, a form of temporal symmetry commonly found in robotics tasks such as door opening and closing. We propose Time Reversal symmetry enhanced Deep Reinforcement Learning (TR-DRL), a framework that combines trajectory reversal augmentation and time reversal guided reward shaping to efficiently solve temporally symmetric tasks. Our method generates reversed transitions from fully reversible transitions, identified by a proposed dynamics-consistent filter, to augment the training data. For partially reversible transitions, we apply reward shaping to guide learning, according to successful trajectories from the reversed task. Extensive experiments on the Robosuite and MetaWorld benchmarks demonstrate that TR-DRL is effective in both single-task and multi-task settings, achieving higher sample efficiency and stronger final performance compared to baseline methods.

Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning

TL;DR

This work tackles low sample efficiency in deep reinforcement learning for robotic manipulation by introducing Time Reversal symmetry enhanced DRL (TR-DRL). It formalizes partial time reversal (PTR) symmetry and combines two complementary components: trajectory reversal augmentation with a dynamics-aware filter to exploit fully reversible transitions, and time reversal symmetry guided reward shaping to leverage partially reversible state components. The method learns inverse and forward dynamics to validate reversed transitions and uses potential-based shaping trained on successful reversed trajectories to guide policy learning. Empirical results on Robosuite and MetaWorld demonstrate superior sample efficiency and stronger final performance, with ablations confirming the value of each component. The approach broadens symmetry-based DRL by incorporating temporal structure, enabling more data-efficient learning in temporally symmetric manipulation tasks.

Abstract

Symmetry is pervasive in robotics and has been widely exploited to improve sample efficiency in deep reinforcement learning (DRL). However, existing approaches primarily focus on spatial symmetries, such as reflection, rotation, and translation, while largely neglecting temporal symmetries. To address this gap, we explore time reversal symmetry, a form of temporal symmetry commonly found in robotics tasks such as door opening and closing. We propose Time Reversal symmetry enhanced Deep Reinforcement Learning (TR-DRL), a framework that combines trajectory reversal augmentation and time reversal guided reward shaping to efficiently solve temporally symmetric tasks. Our method generates reversed transitions from fully reversible transitions, identified by a proposed dynamics-consistent filter, to augment the training data. For partially reversible transitions, we apply reward shaping to guide learning, according to successful trajectories from the reversed task. Extensive experiments on the Robosuite and MetaWorld benchmarks demonstrate that TR-DRL is effective in both single-task and multi-task settings, achieving higher sample efficiency and stronger final performance compared to baseline methods.

Paper Structure

This paper contains 43 sections, 9 equations, 28 figures, 3 tables, 1 algorithm.

Figures (28)

  • Figure 1: For a task pair, the proposed TR-DRL framework learns dynamics and potential models, leverages trajectory reversal augmentation with dynamics aware filtering and time reversal symmetry guided reward shaping, and boosts sample efficiency in both tasks.
  • Figure 2: Examples of fully and partially reversible trajectories. (a) Fully reversible: An example of opening the door outward by grasping the handle; (b) Partially reversible: An example of closing the door from inward by pushing the door.
  • Figure 3: Overview of our TR-DRL. We learn dynamics and potential models, apply reversal augmentation on transitions from the reversed task, and apply time reversal symmetry guided reward shaping on all transitions.
  • Figure 4: Results for single-task setting in 10 environments from Robosuite. Top: Plots and table for IQM of success rate. Bottom: Curves of success rate in two pair of reversible tasks. "Single-Task SAC": baseline; "+reversal aug": trajectory reversal augmentation with dynamics-aware filtering; "+reversal reward shaping": time reversal symmetry guided reward shaping.
  • Figure 5: IQM of success rate for multi-task settings in 10 environments from Robosuite. "Task-Cond" and "Multi-Head" are short for "task-conditioned" and "multi-headed" respectively.
  • ...and 23 more figures