Statler: State-Maintaining Language Models for Embodied Reasoning
Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, Matthew R. Walter
TL;DR
Statler introduces a state-maintaining paradigm for embodied reasoning in robotics by deploying two prompting LLMs—one to read and one to write the world state—that together enable actions conditioned on an explicitly tracked latent state. Framed as a model-based extension of Code-as-Policies, Statler demonstrates superior performance over baselines on simulated tabletop tasks and real-robot experiments, particularly for queries requiring history-aware reasoning. Ablations reveal the value of separating the world-state reader and writer and underscore the importance of maintaining an external state rather than relying on implicit internal LLM memory. The work suggests scalability to longer-horizon planning and provides a modular prompt design with extensive demonstrations to bootstrap state maintenance in diverse domains.
Abstract
There has been a significant research interest in employing large language models to empower intelligent robots with complex reasoning. Existing work focuses on harnessing their abilities to reason about the histories of their actions and observations. In this paper, we explore a new dimension in which large language models may benefit robotics planning. In particular, we propose Statler, a framework in which large language models are prompted to maintain an estimate of the world state, which are often unobservable, and track its transition as new actions are taken. Our framework then conditions each action on the estimate of the current world state. Despite being conceptually simple, our Statler framework significantly outperforms strong competing methods (e.g., Code-as-Policies) on several robot planning tasks. Additionally, it has the potential advantage of scaling up to more challenging long-horizon planning tasks.
