LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model
Siwei Chen, Anxing Xiao, David Hsu
TL;DR
The paper tackles long-horizon, open-world robot task planning by introducing LLM-State, a dynamic state representation that merges a structured object-centric ledger with an unstructured retrospective summary. This hybrid representation is maintained and updated by three LLM roles—Attention, State Estimator, and Policy—to track evolving object attributes and past failures, enabling robust planning under partial observability. Across simulated VirtualHome scenarios and a real Fetch robot, the approach yields substantial improvements over baselines, especially on hard tasks, and ablation studies confirm the critical roles of both the structured entries and the retrospective summary. The work demonstrates that integrating explicit, expandable world models with LLM-driven reasoning significantly enhances open-world, long-horizon planning in household environments.
Abstract
This work addresses the problem of long-horizon task planning with the Large Language Model (LLM) in an open-world household environment. Existing works fail to explicitly track key objects and attributes, leading to erroneous decisions in long-horizon tasks, or rely on highly engineered state features and feedback, which is not generalizable. We propose an open state representation that provides continuous expansion and updating of object attributes from the LLM's inherent capabilities for context understanding and historical action reasoning. Our proposed representation maintains a comprehensive record of an object's attributes and changes, enabling robust retrospective summary of the sequence of actions leading to the current state. This allows continuously updating world model to enhance context understanding for decision-making in task planning. We validate our model through experiments across simulated and real-world task planning scenarios, demonstrating significant improvements over baseline methods in a variety of tasks requiring long-horizon state tracking and reasoning. (Video\footnote{Video demonstration: \url{https://youtu.be/QkN-8pxV3Mo}.})
