Table of Contents
Fetching ...

RecipeMasterLLM: Revisiting RoboEarth in the Era of Large Language Models

Asil Kaan Bozcuoglu, Ziyuan Liu

TL;DR

This work presents RecipeMasterLLM, a framework that automates high-level robotic action planning by fine-tuning a small open-source LLM (CodeLLaMa) to generate action recipes aligned with the RoboEarth Knowledge Graph (RKG) and enhanced by Retrieval-Augmented Generation with digital twin context. The system tightly couples LLM-derived action recipes with a symbolic KG-based inference engine (SWI-Prolog) and a Robot Control Executive to execute grounded plans in a cloud robotics setting. Experimental results in a ROS2/O3DE simulation demonstrate effective prompt-driven task generation (e.g., serving drinks, removing objects, perceiving environments) and show favorable performance against SMART-LLM baselines, while also highlighting hallucination challenges that are mitigated through automatic verification against the RKG. Overall, the paper advances scalable,-grounded, long-horizon robotic planning by combining open-source LLMs, semantic graphs, and RAG to enable autonomous manipulation and task execution in dynamic environments.

Abstract

RoboEarth was a pioneering initiative in cloud robotics, establishing a foundational framework for robots to share and exchange knowledge about actions, objects, and environments through a standardized knowledge graph. Initially, this knowledge was predominantly hand-crafted by engineers using RDF triples within OWL Ontologies, with updates, such as changes in an object's pose, being asserted by the robot's control and perception routines. However, with the advent and rapid development of Large Language Models (LLMs), we believe that the process of knowledge acquisition can be significantly automated. To this end, we propose RecipeMasterLLM, a high-level planner, that generates OWL action ontologies based on a standardized knowledge graph in response to user prompts. This architecture leverages a fine-tuned LLM specifically trained to understand and produce action descriptions consistent with the RoboEarth standardized knowledge graph. Moreover, during the Retrieval-Augmented Generation (RAG) phase, environmental knowledge is supplied to the LLM to enhance its contextual understanding and improve the accuracy of the generated action descriptions.

RecipeMasterLLM: Revisiting RoboEarth in the Era of Large Language Models

TL;DR

This work presents RecipeMasterLLM, a framework that automates high-level robotic action planning by fine-tuning a small open-source LLM (CodeLLaMa) to generate action recipes aligned with the RoboEarth Knowledge Graph (RKG) and enhanced by Retrieval-Augmented Generation with digital twin context. The system tightly couples LLM-derived action recipes with a symbolic KG-based inference engine (SWI-Prolog) and a Robot Control Executive to execute grounded plans in a cloud robotics setting. Experimental results in a ROS2/O3DE simulation demonstrate effective prompt-driven task generation (e.g., serving drinks, removing objects, perceiving environments) and show favorable performance against SMART-LLM baselines, while also highlighting hallucination challenges that are mitigated through automatic verification against the RKG. Overall, the paper advances scalable,-grounded, long-horizon robotic planning by combining open-source LLMs, semantic graphs, and RAG to enable autonomous manipulation and task execution in dynamic environments.

Abstract

RoboEarth was a pioneering initiative in cloud robotics, establishing a foundational framework for robots to share and exchange knowledge about actions, objects, and environments through a standardized knowledge graph. Initially, this knowledge was predominantly hand-crafted by engineers using RDF triples within OWL Ontologies, with updates, such as changes in an object's pose, being asserted by the robot's control and perception routines. However, with the advent and rapid development of Large Language Models (LLMs), we believe that the process of knowledge acquisition can be significantly automated. To this end, we propose RecipeMasterLLM, a high-level planner, that generates OWL action ontologies based on a standardized knowledge graph in response to user prompts. This architecture leverages a fine-tuned LLM specifically trained to understand and produce action descriptions consistent with the RoboEarth standardized knowledge graph. Moreover, during the Retrieval-Augmented Generation (RAG) phase, environmental knowledge is supplied to the LLM to enhance its contextual understanding and improve the accuracy of the generated action descriptions.

Paper Structure

This paper contains 18 sections, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: High-level concept diagram of the proposed system: The user provides a high-level goal to be delegated to the robot. This goal is processed by a fine-tuned large language model (LLM), which references the environment's semantic map (digital twin knowledge). The resulting action plan is then integrated into the RoboEarth inference system.
  • Figure 2: End-to-end pipeline during robot execution. Before execution, the semantic map of the environment is integrated into the RoboEarth Inference System. The process is triggered by a user prompt, which generates and asserts an action recipe. The robot then infers the necessary actions and retrieves relevant knowledge based on the provided prompt.
  • Figure 3: The household environment, Loft, and the robotic platform we are using in our experiments.
  • Figure 4: Graph representation of the action recipe for “Serve me a drink”, detailing required robot capabilities, subactions, and their temporal relationships (e.g., before-after). Key parameters, such as $objectActedOn$, are also specified for effective action parametrization.
  • Figure 5: Target object for $GraspObject$ for Serve me the red small fruit is $strawberry\_1$. As physical properties of fruits do not exist in the RKG, CodeLLaMa makes the inference of description matching and explicitly states $strawberry\_1$ as the $objectActedOn$.
  • ...and 2 more figures