Any House Any Task: Scalable Long-Horizon Planning for Abstract Human Tasks
Zhihong Liu, Yang Li, Rengming Huang, Cewu Lu, Panpan Cai
TL;DR
AHAT tackles open-world, long-horizon household planning under abstract human instructions by training an LLM to generate a sequence of PDDL-grounded subgoals grounded in a textual scene graph, which are then solved by a symbolic planner to yield executable plans. The model is trained with TGPO, a trace-guided RL method that externally corrects intermediate reasoning traces to improve subgoal decomposition, aided by constrained sampling and a two-pass optimization loop. A large synthetic dataset (50k tasks) built from 308 scene graphs and 1.6k personas supports supervised fine-tuning and TGPO, enabling strong generalization across in-domain and out-of-domain tasks and scalable performance as environment size, plan length, and constraint complexity grow. Across AHAT, human, and public benchmarks, AHAT achieves higher success rates and faster planning times than prompting-based, LLM-only, and prior RL-based approaches, demonstrating robust applicability to complex, abstract household tasks with scalable reasoning and planning capabilities.
Abstract
Open world language conditioned task planning is crucial for robots operating in large-scale household environments. While many recent works attempt to address this problem using Large Language Models (LLMs) via prompting or training, a key challenge remains scalability. Performance often degrades rapidly with increasing environment size, plan length, instruction ambiguity, and constraint complexity. In this work, we propose Any House Any Task (AHAT), a household task planner optimized for long-horizon planning in large environments given ambiguous human instructions. At its core, AHAT utilizes an LLM trained to map task instructions and textual scene graphs into grounded subgoals defined in the Planning Domain Definition Language (PDDL). These subgoals are subsequently solved to generate feasible and optimal long-horizon plans through explicit symbolic reasoning. To enhance the model's ability to decompose complex and ambiguous intentions, we introduce TGPO, a novel reinforcement learning algorithm that integrates external correction of intermediate reasoning traces into Group Relative Policy Optimization (GRPO). Experiments demonstrate that AHAT achieves significant performance gains over state-of-the-art prompting, planning, and learning methods, particularly in human-style household tasks characterized by brief instructions but requiring complex execution plans.
