Learning Top-k Subtask Planning Tree based on Discriminative Representation Pre-training for Decision Making
Jingqing Ruan, Kaishen Wang, Qingyang Zhang, Dengpeng Xing, Bo Xu
TL;DR
The paper tackles long-horizon decision making by decomposing complex tasks into subtasks and learning discriminative representations for each subtask via a multiple-encoder and individual-predictor framework. It introduces a top-K subtask planning tree that dynamically expands to guide policy learning with forward-looking reasoning, enabling robust decisions on unseen tasks. Empirical results on BabyAI tasks show improved subtask discrimination, faster training, and superior execution performance compared to strong baselines, with insights on how tree width and depth influence planning. The approach offers a scalable, interpretable framework for subtask-conditioned RL that combines explicit representation learning with principled planning to handle complex dynamics and sparse feedback in real-world-like tasks.
Abstract
Many complicated real-world tasks can be broken down into smaller, more manageable parts, and planning with prior knowledge extracted from these simplified pieces is crucial for humans to make accurate decisions. However, replicating this process remains a challenge for AI agents and naturally raises two questions: How to extract discriminative knowledge representation from priors? How to develop a rational plan to decompose complex problems? Most existing representation learning methods employing a single encoder structure are fragile and sensitive to complex and diverse dynamics. To address this issue, we introduce a multiple-encoder and individual-predictor regime to learn task-essential representations from sufficient data for simple subtasks. Multiple encoders can extract adequate task-relevant dynamics without confusion, and the shared predictor can discriminate the task characteristics. We also use the attention mechanism to generate a top-k subtask planning tree, which customizes subtask execution plans in guiding complex decisions on unseen tasks. This process enables forward-looking and globality by flexibly adjusting the depth and width of the planning tree. Empirical results on a challenging platform composed of some basic simple tasks and combinatorially rich synthetic tasks consistently outperform some competitive baselines and demonstrate the benefits of our design.
