Table of Contents
Fetching ...

LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model

Siwei Chen, Anxing Xiao, David Hsu

TL;DR

The paper tackles long-horizon, open-world robot task planning by introducing LLM-State, a dynamic state representation that merges a structured object-centric ledger with an unstructured retrospective summary. This hybrid representation is maintained and updated by three LLM roles—Attention, State Estimator, and Policy—to track evolving object attributes and past failures, enabling robust planning under partial observability. Across simulated VirtualHome scenarios and a real Fetch robot, the approach yields substantial improvements over baselines, especially on hard tasks, and ablation studies confirm the critical roles of both the structured entries and the retrospective summary. The work demonstrates that integrating explicit, expandable world models with LLM-driven reasoning significantly enhances open-world, long-horizon planning in household environments.

Abstract

This work addresses the problem of long-horizon task planning with the Large Language Model (LLM) in an open-world household environment. Existing works fail to explicitly track key objects and attributes, leading to erroneous decisions in long-horizon tasks, or rely on highly engineered state features and feedback, which is not generalizable. We propose an open state representation that provides continuous expansion and updating of object attributes from the LLM's inherent capabilities for context understanding and historical action reasoning. Our proposed representation maintains a comprehensive record of an object's attributes and changes, enabling robust retrospective summary of the sequence of actions leading to the current state. This allows continuously updating world model to enhance context understanding for decision-making in task planning. We validate our model through experiments across simulated and real-world task planning scenarios, demonstrating significant improvements over baseline methods in a variety of tasks requiring long-horizon state tracking and reasoning. (Video\footnote{Video demonstration: \url{https://youtu.be/QkN-8pxV3Mo}.})

LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model

TL;DR

The paper tackles long-horizon, open-world robot task planning by introducing LLM-State, a dynamic state representation that merges a structured object-centric ledger with an unstructured retrospective summary. This hybrid representation is maintained and updated by three LLM roles—Attention, State Estimator, and Policy—to track evolving object attributes and past failures, enabling robust planning under partial observability. Across simulated VirtualHome scenarios and a real Fetch robot, the approach yields substantial improvements over baselines, especially on hard tasks, and ablation studies confirm the critical roles of both the structured entries and the retrospective summary. The work demonstrates that integrating explicit, expandable world models with LLM-driven reasoning significantly enhances open-world, long-horizon planning in household environments.

Abstract

This work addresses the problem of long-horizon task planning with the Large Language Model (LLM) in an open-world household environment. Existing works fail to explicitly track key objects and attributes, leading to erroneous decisions in long-horizon tasks, or rely on highly engineered state features and feedback, which is not generalizable. We propose an open state representation that provides continuous expansion and updating of object attributes from the LLM's inherent capabilities for context understanding and historical action reasoning. Our proposed representation maintains a comprehensive record of an object's attributes and changes, enabling robust retrospective summary of the sequence of actions leading to the current state. This allows continuously updating world model to enhance context understanding for decision-making in task planning. We validate our model through experiments across simulated and real-world task planning scenarios, demonstrating significant improvements over baseline methods in a variety of tasks requiring long-horizon state tracking and reasoning. (Video\footnote{Video demonstration: \url{https://youtu.be/QkN-8pxV3Mo}.})
Paper Structure (29 sections, 3 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 3 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: LLM-State Example. The proposed state representation is a mixture representation of both structured objects with attributes that are automatically expanded and tracked by LLM, and an unstructured summary of historical data. For instance, when the robot takes food from the fridge (step 1) and uses a microwave to heat it (step 2), our LLM-State auto-expands and tracks the unseen object key attributes through logic-based reasoning.
  • Figure 2: Overview of the system framework. The task planner consists of three components: LLM as Encoder, LLM as State Estimator, and LLM as Policy. The perception system output the observation to Task Planner. In the task planner, LLM encodes and tracks the LLM-State representation, which will be used to assist plan generation. The generated action will be executed by the low-level controller.
  • Figure 3: LLM-State Example of LLM-State Representation.
  • Figure 4: LLM-State Example. After placing a slice of bread in the toaster, the robot fails to place another one (highlighted in red). The unstructured summary offers additional context about the failure (highlighted in green).
  • Figure 5: The five maps used in VirtualHome simulation.
  • ...and 1 more figures