Scalable Task Planning via Large Language Models and Structured World Representations
Rodrigo Pérez-Dattari, Zhaoting Li, Robert Babuška, Jens Kober, Cosimo Della Santina
TL;DR
This paper tackles the intractability of large-scale task planning by marrying a graph-based world model with taxonomy-guided object reduction and LLM-driven pruning. The core idea is to reduce the state space before planning by using two LLM-guided steps: (i) a taxonomy-aware object selection that narrows the relevant objects, and (ii) a relationship-based refinement that accounts for environment-specific interactions, all grounded in a graph representation S=(O,R). The authors demonstrate that planning on the pruned state graph, using either search-based or LLM-based policies, achieves high success rates in VirtualHome and scales to real-world 7-DoF manipulation tasks, with GPT-4o consistently outperforming GPT-3.5. This approach yields substantial improvements in scalability and practicality for robotic task planning, offering a zero-shot pathway to handle thousands of objects without retraining, validated through extensive simulation and real-system experiments.
Abstract
Planning methods struggle with computational intractability in solving task-level problems in large-scale environments. This work explores leveraging the commonsense knowledge encoded in LLMs to empower planning techniques to deal with these complex scenarios. We achieve this by efficiently using LLMs to prune irrelevant components from the planning problem's state space, substantially simplifying its complexity. We demonstrate the efficacy of this system through extensive experiments within a household simulation environment, alongside real-world validation using a 7-DoF manipulator (video https://youtu.be/6ro2UOtOQS4).
