Table of Contents
Fetching ...

Generalization of Compositional Tasks with Logical Specification via Implicit Planning

Duo Xu, Faramarz Fekri

TL;DR

A new hierarchical RL framework is introduced that enhances the efficiency and optimality of task generalization and conducts comprehensive experiments to demonstrate the framework's advantages over previous methods in terms of both efficiency and optimality.

Abstract

In this study, we address the challenge of learning generalizable policies for compositional tasks defined by logical specifications. These tasks consist of multiple temporally extended sub-tasks. Due to the sub-task inter-dependencies and sparse reward issue in long-horizon tasks, existing reinforcement learning (RL) approaches, such as task-conditioned and goal-conditioned policies, continue to struggle with slow convergence and sub-optimal performance in generalizing to compositional tasks. To overcome these limitations, we introduce a new hierarchical RL framework that enhances the efficiency and optimality of task generalization. At the high level, we present an implicit planner specifically designed for generalizing compositional tasks. This planner selects the next sub-task and estimates the multi-step return for completing the remaining task to complete from the current state. It learns a latent transition model and performs planning in the latent space by using a graph neural network (GNN). Subsequently, the high-level planner's selected sub-task guides the low-level agent to effectively handle long-horizon tasks, while the multi-step return encourages the low-level policy to account for future sub-task dependencies, enhancing its optimality. We conduct comprehensive experiments to demonstrate the framework's advantages over previous methods in terms of both efficiency and optimality.

Generalization of Compositional Tasks with Logical Specification via Implicit Planning

TL;DR

A new hierarchical RL framework is introduced that enhances the efficiency and optimality of task generalization and conducts comprehensive experiments to demonstrate the framework's advantages over previous methods in terms of both efficiency and optimality.

Abstract

In this study, we address the challenge of learning generalizable policies for compositional tasks defined by logical specifications. These tasks consist of multiple temporally extended sub-tasks. Due to the sub-task inter-dependencies and sparse reward issue in long-horizon tasks, existing reinforcement learning (RL) approaches, such as task-conditioned and goal-conditioned policies, continue to struggle with slow convergence and sub-optimal performance in generalizing to compositional tasks. To overcome these limitations, we introduce a new hierarchical RL framework that enhances the efficiency and optimality of task generalization. At the high level, we present an implicit planner specifically designed for generalizing compositional tasks. This planner selects the next sub-task and estimates the multi-step return for completing the remaining task to complete from the current state. It learns a latent transition model and performs planning in the latent space by using a graph neural network (GNN). Subsequently, the high-level planner's selected sub-task guides the low-level agent to effectively handle long-horizon tasks, while the multi-step return encourages the low-level policy to account for future sub-task dependencies, enhancing its optimality. We conduct comprehensive experiments to demonstrate the framework's advantages over previous methods in terms of both efficiency and optimality.

Paper Structure

This paper contains 30 sections, 5 equations, 9 figures.

Figures (9)

  • Figure 1: Motivating example 1. Task: first go to red ball, and then blue ball. Red: reaching red ball. Blue: reaching blue ball.
  • Figure 2: Motivating example 2. Left: map. Right: task automaton with self loops omitted.
  • Figure 3: Diagram of implicit planner as the high-level agent. The DAG (abstract graph) of the task is shown in the rightmost figure. The latent tree is spanned by the encoder $\mathcal{E}_{\theta}$ and latent transition model $\mathcal{T}_{\theta}$ in the forward pass, while the feature of future sub-tasks is extracted by GNN ($\mathcal{M}_{\theta}, \mathcal{U}_{\theta}$) in the backward pass. The sub-task $\eta$ and estimated return $V$ are predicted by policy $\pi_{\theta}^h$ and value networks $V^h_{\theta}$, respectively. Note that in GNN, every edge is labeled by a corresponding sub-task derived from the task DAG, and the feature of a edge is the binary encoding of positive and negative propositions of the labeled sub-task.
  • Figure 4: Diagram of processing inputs to the low-level policy $\pi^l_{\omega}$ and value function $V^l_{\omega}$. $\eta$ is the sub-task assigned by the high-level module. $\phi$ is the target task to complete. $\phi'$ is the progression of $\phi$ with $\eta$. The embedding is the representation of $\phi'$ produced by the GNN. $s_t$ is the environmental observation. "Enc" is the encoder mapping observation into a latent vector.
  • Figure 5: Environments
  • ...and 4 more figures