Table of Contents
Fetching ...

Hierarchical Reinforcement Learning Based on Planning Operators

Jing Zhang, Emmanuel Dean, Karinne Ramirez-Amaro

TL;DR

A dual-purpose high-level operator that can be used both in holistic planning and as independent, reusable policies is developed, which offers a flexible solution for long-horizon tasks, e.g., stacking and inserting a cube.

Abstract

Long-horizon manipulation tasks such as stacking represent a longstanding challenge in the field of robotic manipulation, particularly when using reinforcement learning (RL) methods which often struggle to learn the correct sequence of actions for achieving these complex goals. To learn this sequence, symbolic planning methods offer a good solution based on high-level reasoning, however, planners often fall short in addressing the low-level control specificity needed for precise execution. This paper introduces a novel framework that integrates symbolic planning with hierarchical RL through the cooperation of high-level operators and low-level policies. Our contribution integrates planning operators (e.g. preconditions and effects) as part of the hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X) method. We developed a dual-purpose high-level operator, which can be used both in holistic planning and as independent, reusable policies. Our approach offers a flexible solution for long-horizon tasks, e.g., stacking a cube. The experimental results show that our proposed method obtained an average of 97.2% success rate for learning and executing the whole stack sequence, and the success rate for learning independent policies, e.g. reach (98.9%), lift (99.7%), stack (85%), etc. The training time is also reduced by 68% when using our proposed approach.

Hierarchical Reinforcement Learning Based on Planning Operators

TL;DR

A dual-purpose high-level operator that can be used both in holistic planning and as independent, reusable policies is developed, which offers a flexible solution for long-horizon tasks, e.g., stacking and inserting a cube.

Abstract

Long-horizon manipulation tasks such as stacking represent a longstanding challenge in the field of robotic manipulation, particularly when using reinforcement learning (RL) methods which often struggle to learn the correct sequence of actions for achieving these complex goals. To learn this sequence, symbolic planning methods offer a good solution based on high-level reasoning, however, planners often fall short in addressing the low-level control specificity needed for precise execution. This paper introduces a novel framework that integrates symbolic planning with hierarchical RL through the cooperation of high-level operators and low-level policies. Our contribution integrates planning operators (e.g. preconditions and effects) as part of the hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X) method. We developed a dual-purpose high-level operator, which can be used both in holistic planning and as independent, reusable policies. Our approach offers a flexible solution for long-horizon tasks, e.g., stacking a cube. The experimental results show that our proposed method obtained an average of 97.2% success rate for learning and executing the whole stack sequence, and the success rate for learning independent policies, e.g. reach (98.9%), lift (99.7%), stack (85%), etc. The training time is also reduced by 68% when using our proposed approach.
Paper Structure (19 sections, 6 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 19 sections, 6 equations, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: Overview of our approach of combining RL and planning operators.
  • Figure 2: Figures on the left show the traditional RL and a symbolic planning framework while the figure on the right shows the framework we proposed in this paper that combines RL and Operators. Here $s$ is used to describe the state related to blocks or gripper.
  • Figure 3: Operator structure which is composed of preconditions and effects.
  • Figure 4: Example of executing the learned sequence and policies for the task of stacking 2 blocks. At step $1$, the agent reaches the red block, then at step $2$, since reach is finished, the next operator would be close (step $3$), then lift is executed (step $4$), but at step $5$, the unexpected situation happens that the block is dropped, so it needs to re-reach ($6$) and lift it ($7$), finally the agent successfully move the block and stack it on the blue one ($8$), now the whole task of STACK is finished. https://youtu.be/FQYWLhLTeOg.
  • Figure 5: Chained policy execution success rate for STACK 2 blocks and INSERT block into a slot.
  • ...and 2 more figures