Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following
Yuxiao Yang, Shenao Zhang, Zhihan Liu, Huaxiu Yao, Zhaoran Wang
TL;DR
This work reframes Embodied Instruction Following (EIF) as a Partially Observable Markov Decision Process (POMDP) and presents a closed-loop Hindsight Planner that operates effectively under few-shot large language model (LLM) reasoning. The approach integrates an adaptation module to infer latent state, a long-horizon actor–critic planner (RAFA), and a novel hindsight relabeling mechanism to leverage suboptimal trajectories during training and deployment. Key contributions include: (1) a POMDP-centric planning framework for EIF, (2) a hindsight prompting strategy that preserves task distributions while enriching learning signals, and (3) demonstrated state-of-the-art few-shot performance on ALFRED, approaching or surpassing some full-shot supervised baselines. The results indicate substantial robustness gains in long-horizon tasks and highlight the practical potential of combining LLM-based reasoning with structured planning under partial observability.
Abstract
This work focuses on building a task planner for Embodied Instruction Following (EIF) using Large Language Models (LLMs). Previous works typically train a planner to imitate expert trajectories, treating this as a supervised task. While these methods achieve competitive performance, they often lack sufficient robustness. When a suboptimal action is taken, the planner may encounter an out-of-distribution state, which can lead to task failure. In contrast, we frame the task as a Partially Observable Markov Decision Process (POMDP) and aim to develop a robust planner under a few-shot assumption. Thus, we propose a closed-loop planner with an adaptation module and a novel hindsight method, aiming to use as much information as possible to assist the planner. Our experiments on the ALFRED dataset indicate that our planner achieves competitive performance under a few-shot assumption. For the first time, our few-shot agent's performance approaches and even surpasses that of the full-shot supervised agent.
