FlowPlan: Zero-Shot Task Planning with LLM Flow Engineering for Robotic Instruction Following
Zijun Lin, Chao Tang, Hanjing Ye, Hong Zhang
TL;DR
FlowPlan addresses the challenge of zero-shot robotic instruction following by introducing a structured four-stage LLM workflow (Task Information Retrieval, Language-Level Reasoning, Symbolic-Level Planning, Logical Evaluation) coupled with context-aligned target localization built on an online semantic map. This modular design enables robust grounding of lengthy instructions under operational constraints without labeled data, achieving strong performance on ALFRED and successful real-world deployments. The key contributions include a formalized multi-stage planning process, a context-aware grounding mechanism, and comprehensive ablations that underscore the importance of each component. The approach offers practical impact by reducing data requirements and adapting across diverse environments, with potential extensions to vision-language fusion and open-vocabulary perception.
Abstract
Robotic instruction following tasks require seamless integration of visual perception, task planning, target localization, and motion execution. However, existing task planning methods for instruction following are either data-driven or underperform in zero-shot scenarios due to difficulties in grounding lengthy instructions into actionable plans under operational constraints. To address this, we propose FlowPlan, a structured multi-stage LLM workflow that elevates zero-shot pipeline and bridges the performance gap between zero-shot and data-driven in-context learning methods. By decomposing the planning process into modular stages--task information retrieval, language-level reasoning, symbolic-level planning, and logical evaluation--FlowPlan generates logically coherent action sequences while adhering to operational constraints and further extracts contextual guidance for precise instance-level target localization. Benchmarked on the ALFRED and validated in real-world applications, our method achieves competitive performance relative to data-driven in-context learning methods and demonstrates adaptability across diverse environments. This work advances zero-shot task planning in robotic systems without reliance on labeled data. Project website: https://instruction-following-project.github.io/.
