RecipeMasterLLM: Revisiting RoboEarth in the Era of Large Language Models
Asil Kaan Bozcuoglu, Ziyuan Liu
TL;DR
This work presents RecipeMasterLLM, a framework that automates high-level robotic action planning by fine-tuning a small open-source LLM (CodeLLaMa) to generate action recipes aligned with the RoboEarth Knowledge Graph (RKG) and enhanced by Retrieval-Augmented Generation with digital twin context. The system tightly couples LLM-derived action recipes with a symbolic KG-based inference engine (SWI-Prolog) and a Robot Control Executive to execute grounded plans in a cloud robotics setting. Experimental results in a ROS2/O3DE simulation demonstrate effective prompt-driven task generation (e.g., serving drinks, removing objects, perceiving environments) and show favorable performance against SMART-LLM baselines, while also highlighting hallucination challenges that are mitigated through automatic verification against the RKG. Overall, the paper advances scalable,-grounded, long-horizon robotic planning by combining open-source LLMs, semantic graphs, and RAG to enable autonomous manipulation and task execution in dynamic environments.
Abstract
RoboEarth was a pioneering initiative in cloud robotics, establishing a foundational framework for robots to share and exchange knowledge about actions, objects, and environments through a standardized knowledge graph. Initially, this knowledge was predominantly hand-crafted by engineers using RDF triples within OWL Ontologies, with updates, such as changes in an object's pose, being asserted by the robot's control and perception routines. However, with the advent and rapid development of Large Language Models (LLMs), we believe that the process of knowledge acquisition can be significantly automated. To this end, we propose RecipeMasterLLM, a high-level planner, that generates OWL action ontologies based on a standardized knowledge graph in response to user prompts. This architecture leverages a fine-tuned LLM specifically trained to understand and produce action descriptions consistent with the RoboEarth standardized knowledge graph. Moreover, during the Retrieval-Augmented Generation (RAG) phase, environmental knowledge is supplied to the LLM to enhance its contextual understanding and improve the accuracy of the generated action descriptions.
