Table of Contents
Fetching ...

Any House Any Task: Scalable Long-Horizon Planning for Abstract Human Tasks

Zhihong Liu, Yang Li, Rengming Huang, Cewu Lu, Panpan Cai

TL;DR

AHAT tackles open-world, long-horizon household planning under abstract human instructions by training an LLM to generate a sequence of PDDL-grounded subgoals grounded in a textual scene graph, which are then solved by a symbolic planner to yield executable plans. The model is trained with TGPO, a trace-guided RL method that externally corrects intermediate reasoning traces to improve subgoal decomposition, aided by constrained sampling and a two-pass optimization loop. A large synthetic dataset (50k tasks) built from 308 scene graphs and 1.6k personas supports supervised fine-tuning and TGPO, enabling strong generalization across in-domain and out-of-domain tasks and scalable performance as environment size, plan length, and constraint complexity grow. Across AHAT, human, and public benchmarks, AHAT achieves higher success rates and faster planning times than prompting-based, LLM-only, and prior RL-based approaches, demonstrating robust applicability to complex, abstract household tasks with scalable reasoning and planning capabilities.

Abstract

Open world language conditioned task planning is crucial for robots operating in large-scale household environments. While many recent works attempt to address this problem using Large Language Models (LLMs) via prompting or training, a key challenge remains scalability. Performance often degrades rapidly with increasing environment size, plan length, instruction ambiguity, and constraint complexity. In this work, we propose Any House Any Task (AHAT), a household task planner optimized for long-horizon planning in large environments given ambiguous human instructions. At its core, AHAT utilizes an LLM trained to map task instructions and textual scene graphs into grounded subgoals defined in the Planning Domain Definition Language (PDDL). These subgoals are subsequently solved to generate feasible and optimal long-horizon plans through explicit symbolic reasoning. To enhance the model's ability to decompose complex and ambiguous intentions, we introduce TGPO, a novel reinforcement learning algorithm that integrates external correction of intermediate reasoning traces into Group Relative Policy Optimization (GRPO). Experiments demonstrate that AHAT achieves significant performance gains over state-of-the-art prompting, planning, and learning methods, particularly in human-style household tasks characterized by brief instructions but requiring complex execution plans.

Any House Any Task: Scalable Long-Horizon Planning for Abstract Human Tasks

TL;DR

AHAT tackles open-world, long-horizon household planning under abstract human instructions by training an LLM to generate a sequence of PDDL-grounded subgoals grounded in a textual scene graph, which are then solved by a symbolic planner to yield executable plans. The model is trained with TGPO, a trace-guided RL method that externally corrects intermediate reasoning traces to improve subgoal decomposition, aided by constrained sampling and a two-pass optimization loop. A large synthetic dataset (50k tasks) built from 308 scene graphs and 1.6k personas supports supervised fine-tuning and TGPO, enabling strong generalization across in-domain and out-of-domain tasks and scalable performance as environment size, plan length, and constraint complexity grow. Across AHAT, human, and public benchmarks, AHAT achieves higher success rates and faster planning times than prompting-based, LLM-only, and prior RL-based approaches, demonstrating robust applicability to complex, abstract household tasks with scalable reasoning and planning capabilities.

Abstract

Open world language conditioned task planning is crucial for robots operating in large-scale household environments. While many recent works attempt to address this problem using Large Language Models (LLMs) via prompting or training, a key challenge remains scalability. Performance often degrades rapidly with increasing environment size, plan length, instruction ambiguity, and constraint complexity. In this work, we propose Any House Any Task (AHAT), a household task planner optimized for long-horizon planning in large environments given ambiguous human instructions. At its core, AHAT utilizes an LLM trained to map task instructions and textual scene graphs into grounded subgoals defined in the Planning Domain Definition Language (PDDL). These subgoals are subsequently solved to generate feasible and optimal long-horizon plans through explicit symbolic reasoning. To enhance the model's ability to decompose complex and ambiguous intentions, we introduce TGPO, a novel reinforcement learning algorithm that integrates external correction of intermediate reasoning traces into Group Relative Policy Optimization (GRPO). Experiments demonstrate that AHAT achieves significant performance gains over state-of-the-art prompting, planning, and learning methods, particularly in human-style household tasks characterized by brief instructions but requiring complex execution plans.
Paper Structure (27 sections, 10 equations, 3 figures, 4 tables)

This paper contains 27 sections, 10 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: In large-scale environments, AHAT receives abstract instructions and a scene graph, generates a decomposition trace and corresponding subgoals. These subgoals are then solved using a PDDL planner, resulting in an executable long-horizon plan that satisfies the user's requirements.
  • Figure 2: Overview of AHAT. (a) Data Generation: Task synthesis and annotation. (b) Policy Supervision: Supervised fine-tuning (SFT) on the constructed long-horizon household planning dataset. (c) Trace-Guided Policy Optimization: The reinforcement learning loop that integrates external correction of intermediate reasoning traces, improving subgoal generation and task decomposition through constrained sampling, and optimizing the AHAT model for robust planning performance.
  • Figure 3: Scalability Test Results showing the variation of success rate with: (a) Scene Graph Size, (b) Plan Length, (c) Task Abstractness, and (d) Constraint Complexity.