Table of Contents
Fetching ...

LLM-guided Task and Motion Planning using Knowledge-based Reasoning

Muhayy Ud Din, Jan Rosell, Waseem Akram, Isiah Zaplana, Maximo A Roa, Irfan Hussain

TL;DR

The paper tackles the fragility of LLM-based task and motion planning (TAMP) caused by static, template prompts in dynamic environments. It introduces Onto-LLM-TAMP, a knowledge-based framework that enriches prompts with ontology-driven context, semantic tagging, and environment-state descriptions, feeding into an LLM to produce semantically correct action sequences. The architecture combines an Ontological Prompt Construction Layer with a Planning Layer, employing SPARQL-enabled contextual inference, SpaCy tagging, YOLO/FoundationPose perception, and RRTConnect motion planning, with a feedback loop to replan on failures. Empirical results in simulation and real-world scenarios show robust planning under ambiguous prompts, improved task/execution success, and competitive planning times across multiple LLMs. The work demonstrates practical improvements in adaptive, semantically accurate TAMP by integrating domain knowledge with LLM reasoning, enabling more reliable robotic manipulation in dynamic settings.

Abstract

Performing complex manipulation tasks in dynamic environments requires efficient Task and Motion Planning (TAMP) approaches that combine high-level symbolic plans with low-level motion control. Advances in Large Language Models (LLMs), such as GPT-4, are transforming task planning by offering natural language as an intuitive and flexible way to describe tasks, generate symbolic plans, and reason. However, the effectiveness of LLM-based TAMP approaches is limited due to static and template-based prompting, which limits adaptability to dynamic environments and complex task contexts. To address these limitations, this work proposes a novel Onto-LLM-TAMP framework that employs knowledge-based reasoning to refine and expand user prompts with task-contextual reasoning and knowledge-based environment state descriptions. Integrating domain-specific knowledge into the prompt ensures semantically accurate and context-aware task plans. The proposed framework demonstrates its effectiveness by resolving semantic errors in symbolic plan generation, such as maintaining logical temporal goal ordering in scenarios involving hierarchical object placement. The proposed framework is validated through both simulation and real-world scenarios, demonstrating significant improvements over the baseline approach in terms of adaptability to dynamic environments and the generation of semantically correct task plans.

LLM-guided Task and Motion Planning using Knowledge-based Reasoning

TL;DR

The paper tackles the fragility of LLM-based task and motion planning (TAMP) caused by static, template prompts in dynamic environments. It introduces Onto-LLM-TAMP, a knowledge-based framework that enriches prompts with ontology-driven context, semantic tagging, and environment-state descriptions, feeding into an LLM to produce semantically correct action sequences. The architecture combines an Ontological Prompt Construction Layer with a Planning Layer, employing SPARQL-enabled contextual inference, SpaCy tagging, YOLO/FoundationPose perception, and RRTConnect motion planning, with a feedback loop to replan on failures. Empirical results in simulation and real-world scenarios show robust planning under ambiguous prompts, improved task/execution success, and competitive planning times across multiple LLMs. The work demonstrates practical improvements in adaptive, semantically accurate TAMP by integrating domain knowledge with LLM reasoning, enabling more reliable robotic manipulation in dynamic settings.

Abstract

Performing complex manipulation tasks in dynamic environments requires efficient Task and Motion Planning (TAMP) approaches that combine high-level symbolic plans with low-level motion control. Advances in Large Language Models (LLMs), such as GPT-4, are transforming task planning by offering natural language as an intuitive and flexible way to describe tasks, generate symbolic plans, and reason. However, the effectiveness of LLM-based TAMP approaches is limited due to static and template-based prompting, which limits adaptability to dynamic environments and complex task contexts. To address these limitations, this work proposes a novel Onto-LLM-TAMP framework that employs knowledge-based reasoning to refine and expand user prompts with task-contextual reasoning and knowledge-based environment state descriptions. Integrating domain-specific knowledge into the prompt ensures semantically accurate and context-aware task plans. The proposed framework demonstrates its effectiveness by resolving semantic errors in symbolic plan generation, such as maintaining logical temporal goal ordering in scenarios involving hierarchical object placement. The proposed framework is validated through both simulation and real-world scenarios, demonstrating significant improvements over the baseline approach in terms of adaptability to dynamic environments and the generation of semantically correct task plans.

Paper Structure

This paper contains 22 sections, 2 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: The Onto-LLM-TAMP framework enhances prompt elaboration for generating semantically accurate symbolic plans. It begins by processing the user input to extract actions and objects through semantic tagging. The Contextual Inference Engine uses SPARQL queries to retrieve object types and priorities from the ontology, ensuring the correct action sequence based on predefined rules. The Perception Module, with YOLO-based object detection and FoundationPose for object pose estimation, provides real-time spatial data. This information is textualized using ontological knowledge by the Environmental State Descriptor and fed into the Prompt Generator. The final prompt is then fed into the LLM Task Planner, which produces a structured task plan. Finally, the Motion Planner ensures the robot executes the task with feasible, collision-free movements.
  • Figure 2: Kitchen ontology, showing the hierarchy of Classes (yellow), Properties (green and blue), and Individuals (purple).
  • Figure 3: Illustrate how the prompt generator integrates the information of user input (green), contextual inference module (yellow), Environment State Description (blue), and Prompt Template (black), to construct the final system prompt. The Generated Prompt incorporates structured environment data, action constraints, and reasoning to guide robotic decision-making effectively.
  • Figure 4: Example scenarios validating the proposed approach: the first row shows the initial states and the second row shows the goal state. each scenario is used to perform multiple tasks, some of them are given below: (A) Task: Put bowl, banana, and apple on the plate; (B) Task: Clean table, move sugar box, tomato can, and cracker box to the left table, move the plate and cup to the right table; (C) Task: Serve breakfast by placing plate, bread, and cup on the table; (D) Task: Stack plate1, plate2, and cup on plate3.
  • Figure 5: Sequence of snapshots of the following tasks: (A) Task: Put apple, banana, and bowl in plate; (B) Task: Clean table, move plate and cup to the right_table, move sugar_box, tomato_can, and cracker_box to the left_table.
  • ...and 6 more figures