Table of Contents
Fetching ...

Learning Top-k Subtask Planning Tree based on Discriminative Representation Pre-training for Decision Making

Jingqing Ruan, Kaishen Wang, Qingyang Zhang, Dengpeng Xing, Bo Xu

TL;DR

The paper tackles long-horizon decision making by decomposing complex tasks into subtasks and learning discriminative representations for each subtask via a multiple-encoder and individual-predictor framework. It introduces a top-K subtask planning tree that dynamically expands to guide policy learning with forward-looking reasoning, enabling robust decisions on unseen tasks. Empirical results on BabyAI tasks show improved subtask discrimination, faster training, and superior execution performance compared to strong baselines, with insights on how tree width and depth influence planning. The approach offers a scalable, interpretable framework for subtask-conditioned RL that combines explicit representation learning with principled planning to handle complex dynamics and sparse feedback in real-world-like tasks.

Abstract

Many complicated real-world tasks can be broken down into smaller, more manageable parts, and planning with prior knowledge extracted from these simplified pieces is crucial for humans to make accurate decisions. However, replicating this process remains a challenge for AI agents and naturally raises two questions: How to extract discriminative knowledge representation from priors? How to develop a rational plan to decompose complex problems? Most existing representation learning methods employing a single encoder structure are fragile and sensitive to complex and diverse dynamics. To address this issue, we introduce a multiple-encoder and individual-predictor regime to learn task-essential representations from sufficient data for simple subtasks. Multiple encoders can extract adequate task-relevant dynamics without confusion, and the shared predictor can discriminate the task characteristics. We also use the attention mechanism to generate a top-k subtask planning tree, which customizes subtask execution plans in guiding complex decisions on unseen tasks. This process enables forward-looking and globality by flexibly adjusting the depth and width of the planning tree. Empirical results on a challenging platform composed of some basic simple tasks and combinatorially rich synthetic tasks consistently outperform some competitive baselines and demonstrate the benefits of our design.

Learning Top-k Subtask Planning Tree based on Discriminative Representation Pre-training for Decision Making

TL;DR

The paper tackles long-horizon decision making by decomposing complex tasks into subtasks and learning discriminative representations for each subtask via a multiple-encoder and individual-predictor framework. It introduces a top-K subtask planning tree that dynamically expands to guide policy learning with forward-looking reasoning, enabling robust decisions on unseen tasks. Empirical results on BabyAI tasks show improved subtask discrimination, faster training, and superior execution performance compared to strong baselines, with insights on how tree width and depth influence planning. The approach offers a scalable, interpretable framework for subtask-conditioned RL that combines explicit representation learning with principled planning to handle complex dynamics and sparse feedback in real-world-like tasks.

Abstract

Many complicated real-world tasks can be broken down into smaller, more manageable parts, and planning with prior knowledge extracted from these simplified pieces is crucial for humans to make accurate decisions. However, replicating this process remains a challenge for AI agents and naturally raises two questions: How to extract discriminative knowledge representation from priors? How to develop a rational plan to decompose complex problems? Most existing representation learning methods employing a single encoder structure are fragile and sensitive to complex and diverse dynamics. To address this issue, we introduce a multiple-encoder and individual-predictor regime to learn task-essential representations from sufficient data for simple subtasks. Multiple encoders can extract adequate task-relevant dynamics without confusion, and the shared predictor can discriminate the task characteristics. We also use the attention mechanism to generate a top-k subtask planning tree, which customizes subtask execution plans in guiding complex decisions on unseen tasks. This process enables forward-looking and globality by flexibly adjusting the depth and width of the planning tree. Empirical results on a challenging platform composed of some basic simple tasks and combinatorially rich synthetic tasks consistently outperform some competitive baselines and demonstrate the benefits of our design.
Paper Structure (23 sections, 17 equations, 18 figures, 5 tables, 1 algorithm)

This paper contains 23 sections, 17 equations, 18 figures, 5 tables, 1 algorithm.

Figures (18)

  • Figure 1: Taking two scenarios in Overcooked AI carroll2019utility as examples, the top denotes cutting food, and the bottom is serving customers. Cutting food should take command of solving a sequence of subtasks: prepare the chopping board, pick up an onion, and use a knife to cut. The execution of serving customers should be performed: wash dishes, dish up, and deliver.
  • Figure 2: The multiple-encoder and individual-predictor learning regime. Each subtask encoder $\mathcal{E}^i$ encodes the state-action pair $(s_t, a_{t})^i \in \mathcal{T}^i$ and outputs the compact subtask representation $z_t^i$ at each time step. The shared subtask predictor predicts the reward $r_t$ and the next state $s_{t+1}$. The contrastive and prediction losses are designed to train the subtask encoders and the predictor end-to-end.
  • Figure 3: The generation of the top-$K$ subtask planning tree.
  • Figure 4: The schematics of four basic subtasks. The dashed lines denote the path to reach the goal. The agent represented by the red triangle is partially observable to the grid and the light-grey shaded area represents its field of view. The missions are described as : (a). Open the red door; (b). Go to the red box; (c). Put the grey box next to the red key. (d). Pick up the green ball.
  • Figure 5: Illustration of one of the BossLevel scenarios. The mission is: "Pick up a red box and go to the purple door".
  • ...and 13 more figures

Theorems & Definitions (1)

  • Definition 1