Table of Contents
Fetching ...

From Vague Instructions to Task Plans: A Feedback-Driven HRC Task Planning Framework based on LLMs

Afagh Mehri Shervedani, Matthew R. Walter, Milos Zefran

TL;DR

This work tackles planning in human-robot collaboration when human inputs are vague and implicit. It introduces IteraPlan, a hybrid framework that leverages large language models in combination with real-time human feedback to generate and refine executable task plans from concise prompts. The approach unites four components—decomposition of vague instructions, translation to executable code, real-time execution with iterative refinement, and affordance-based task allocation—to produce context-aware, adaptable plans validated in the AI2-THOR simulation. The results show effective plan generation and refinement across diverse environments, with open-source release enabling practical adoption and further research.

Abstract

Recent advances in large language models (LLMs) have demonstrated their potential as planners in human-robot collaboration (HRC) scenarios, offering a promising alternative to traditional planning methods. LLMs, which can generate structured plans by reasoning over natural language inputs, have the ability to generalize across diverse tasks and adapt to human instructions. This paper investigates the potential of LLMs to facilitate planning in the context of human-robot collaborative tasks, with a focus on their ability to reason from high-level, vague human inputs, and fine-tune plans based on real-time feedback. We propose a novel hybrid framework that combines LLMs with human feedback to create dynamic, context-aware task plans. Our work also highlights how a single, concise prompt can be used for a wide range of tasks and environments, overcoming the limitations of long, detailed structured prompts typically used in prior studies. By integrating user preferences into the planning loop, we ensure that the generated plans are not only effective but aligned with human intentions.

From Vague Instructions to Task Plans: A Feedback-Driven HRC Task Planning Framework based on LLMs

TL;DR

This work tackles planning in human-robot collaboration when human inputs are vague and implicit. It introduces IteraPlan, a hybrid framework that leverages large language models in combination with real-time human feedback to generate and refine executable task plans from concise prompts. The approach unites four components—decomposition of vague instructions, translation to executable code, real-time execution with iterative refinement, and affordance-based task allocation—to produce context-aware, adaptable plans validated in the AI2-THOR simulation. The results show effective plan generation and refinement across diverse environments, with open-source release enabling practical adoption and further research.

Abstract

Recent advances in large language models (LLMs) have demonstrated their potential as planners in human-robot collaboration (HRC) scenarios, offering a promising alternative to traditional planning methods. LLMs, which can generate structured plans by reasoning over natural language inputs, have the ability to generalize across diverse tasks and adapt to human instructions. This paper investigates the potential of LLMs to facilitate planning in the context of human-robot collaborative tasks, with a focus on their ability to reason from high-level, vague human inputs, and fine-tune plans based on real-time feedback. We propose a novel hybrid framework that combines LLMs with human feedback to create dynamic, context-aware task plans. Our work also highlights how a single, concise prompt can be used for a wide range of tasks and environments, overcoming the limitations of long, detailed structured prompts typically used in prior studies. By integrating user preferences into the planning loop, we ensure that the generated plans are not only effective but aligned with human intentions.

Paper Structure

This paper contains 23 sections, 1 figure, 7 tables.

Figures (1)

  • Figure 1: Overview of the IteraPlan framework. The framework consists of four key components: (1) Task Decomposition, where vague, high-level instructions are transformed into structured task descriptions and subtasks; (2) Code Generation, where the subtasks are translated into executable Python code based on available skills and environmental constraints; (3) Real-Time Execution and Adaptive Code Refinement, where the generated code is executed within the AI2-THOR simulation environment, errors are detected, and iterative refinements are made to ensure feasibility; and (4) Affordance-Based Task Allocation, where executable code is adjusted to allocate actions between human and robot agents based on their capabilities.