Table of Contents
Fetching ...

MRHER: Model-based Relay Hindsight Experience Replay for Sequential Object Manipulation Tasks with Sparse Rewards

Yuming Huang, Bin Ren, Ziming Xu, Lianghong Wu

TL;DR

A novel model-based RL framework called Model-based Relay Hindsight Experience Replay (MRHER), which incorporates a new robust model-based relabeling method called Foresight relabeling (FR), which predicts the future trajectory of the hindsight state and relabels the expected goal as a goal achieved on the virtual future trajectory.

Abstract

Sparse rewards pose a significant challenge to achieving high sample efficiency in goal-conditioned reinforcement learning (RL). Specifically, in sequential manipulation tasks, the agent receives failure rewards until it successfully completes the entire manipulation task, which leads to low sample efficiency. To tackle this issue and improve sample efficiency, we propose a novel model-based RL framework called Model-based Relay Hindsight Experience Replay (MRHER). MRHER breaks down a continuous task into subtasks with increasing complexity and utilizes the previous subtask to guide the learning of the subsequent one. Instead of using Hindsight Experience Replay (HER) in every subtask, we design a new robust model-based relabeling method called Foresight relabeling (FR). FR predicts the future trajectory of the hindsight state and relabels the expected goal as a goal achieved on the virtual future trajectory. By incorporating FR, MRHER effectively captures more information from historical experiences, leading to improved sample efficiency, particularly in object-manipulation environments. Experimental results demonstrate that MRHER exhibits state-of-the-art sample efficiency in benchmark tasks, outperforming RHER by 13.79% and 14.29% in the FetchPush-v1 environment and FetchPickandPlace-v1 environment, respectively.

MRHER: Model-based Relay Hindsight Experience Replay for Sequential Object Manipulation Tasks with Sparse Rewards

TL;DR

A novel model-based RL framework called Model-based Relay Hindsight Experience Replay (MRHER), which incorporates a new robust model-based relabeling method called Foresight relabeling (FR), which predicts the future trajectory of the hindsight state and relabels the expected goal as a goal achieved on the virtual future trajectory.

Abstract

Sparse rewards pose a significant challenge to achieving high sample efficiency in goal-conditioned reinforcement learning (RL). Specifically, in sequential manipulation tasks, the agent receives failure rewards until it successfully completes the entire manipulation task, which leads to low sample efficiency. To tackle this issue and improve sample efficiency, we propose a novel model-based RL framework called Model-based Relay Hindsight Experience Replay (MRHER). MRHER breaks down a continuous task into subtasks with increasing complexity and utilizes the previous subtask to guide the learning of the subsequent one. Instead of using Hindsight Experience Replay (HER) in every subtask, we design a new robust model-based relabeling method called Foresight relabeling (FR). FR predicts the future trajectory of the hindsight state and relabels the expected goal as a goal achieved on the virtual future trajectory. By incorporating FR, MRHER effectively captures more information from historical experiences, leading to improved sample efficiency, particularly in object-manipulation environments. Experimental results demonstrate that MRHER exhibits state-of-the-art sample efficiency in benchmark tasks, outperforming RHER by 13.79% and 14.29% in the FetchPush-v1 environment and FetchPickandPlace-v1 environment, respectively.
Paper Structure (19 sections, 6 equations, 10 figures, 1 table, 2 algorithms)

This paper contains 19 sections, 6 equations, 10 figures, 1 table, 2 algorithms.

Figures (10)

  • Figure 1: The Identical Non-Negative Reward (INNR) problem of HER.
  • Figure 2: The INNR problem of Model-based Relabeling. Real trajectories are collected by the agent based on the past policy $\mu$, while virtual trajectories are collected based on the newest policy $\pi$ and dynamical models. HER uses the achieved goals in the historical trajectories to relabel the expected goals. MBR predict the future trajectories of the current transition for goal relabeling.
  • Figure 3: The decomposition and recombination of a sequential task.
  • Figure 4: The SGES in a block-pushing task.
  • Figure 5: Diagram of Foresight relabeling. Foresight relabeling selects a beginning state, generates a virtual trajectories, and then uses the achieved goals on the virtual trajectories to relabel the expected goal.
  • ...and 5 more figures