Table of Contents
Fetching ...

How Can LLMs and Knowledge Graphs Contribute to Robot Safety? A Few-Shot Learning Approach

Abdulrahman Althobaiti, Angel Ayala, JingYing Gao, Ali Almutairi, Mohammad Deghat, Imran Razzak, Francisco Cruz

TL;DR

The paper tackles safety risks in NLP-driven drone control by introducing a safety layer that verifies LLM-generated code before execution. It combines Few-Shot learning to fine-tune GPT-4o for safe/unsafe code classification with Knowledge Graph Prompting to inject CASA drone regulations, evaluated in the AirSim environment. A 100-sample, four-category dataset is created and used to show that the fine-tuned, KG-enhanced classifier improves unsafe-command detection and binding safety, with constraints such as a maximum altitude of $120$ m and minimum distances of $30$ m. The work demonstrates that integrating domain-specific knowledge graphs with LLM reasoning can enhance safety in robot control without retraining base models, guiding safer NLP-driven robotics in dynamic outdoor settings.

Abstract

Large Language Models (LLMs) are transforming the robotics domain by enabling robots to comprehend and execute natural language instructions. The cornerstone benefits of LLM include processing textual data from technical manuals, instructions, academic papers, and user queries based on the knowledge provided. However, deploying LLM-generated code in robotic systems without safety verification poses significant risks. This paper outlines a safety layer that verifies the code generated by ChatGPT before executing it to control a drone in a simulated environment. The safety layer consists of a fine-tuned GPT-4o model using Few-Shot learning, supported by knowledge graph prompting (KGP). Our approach improves the safety and compliance of robotic actions, ensuring that they adhere to the regulations of drone operations.

How Can LLMs and Knowledge Graphs Contribute to Robot Safety? A Few-Shot Learning Approach

TL;DR

The paper tackles safety risks in NLP-driven drone control by introducing a safety layer that verifies LLM-generated code before execution. It combines Few-Shot learning to fine-tune GPT-4o for safe/unsafe code classification with Knowledge Graph Prompting to inject CASA drone regulations, evaluated in the AirSim environment. A 100-sample, four-category dataset is created and used to show that the fine-tuned, KG-enhanced classifier improves unsafe-command detection and binding safety, with constraints such as a maximum altitude of m and minimum distances of m. The work demonstrates that integrating domain-specific knowledge graphs with LLM reasoning can enhance safety in robot control without retraining base models, guiding safer NLP-driven robotics in dynamic outdoor settings.

Abstract

Large Language Models (LLMs) are transforming the robotics domain by enabling robots to comprehend and execute natural language instructions. The cornerstone benefits of LLM include processing textual data from technical manuals, instructions, academic papers, and user queries based on the knowledge provided. However, deploying LLM-generated code in robotic systems without safety verification poses significant risks. This paper outlines a safety layer that verifies the code generated by ChatGPT before executing it to control a drone in a simulated environment. The safety layer consists of a fine-tuned GPT-4o model using Few-Shot learning, supported by knowledge graph prompting (KGP). Our approach improves the safety and compliance of robotic actions, ensuring that they adhere to the regulations of drone operations.

Paper Structure

This paper contains 10 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Unsafe commands, especially those that ignore obstacles or boundaries, can cause the drone to crash into objects such as buildings, trees, or vehicles, resulting in physical damage to assets or humans.
  • Figure 2: Figure (a) depicts the system pipeline showing different stages describing human-robot interaction from inputting commands to drone execution of these commands if they are classified as safe. In addition, in Figure (b) illustrates the preparation and training of the classifier data which produced a fine-tuned GPT-4o model.
  • Figure 3: Example of the Knowledge Graph Prompt used to classify generated code.
  • Figure 4: Classifier model integrated in the system pipeline.
  • Figure 5: Confusion matrices for the GPT-4o and FTGPT-4o models with and without KGP for each one of the rules groups. The Altitude, DistCrowd, DistObject, and HoverCrowd rules groups are balanced with a total of 8, 24, 24, and 24 examples. For the rules regarding the maximum altitude (Altitude) and allowed distance to the objects (DistObject), both models presented the same performance, being improved with the KGP approach. For the rules related to person/crowd, both models struggle to perform well, even when using KGP. Hence, this outcome can be related to the high similarity with the instructions present in the DistObject and Altitude groups. The highlighted model depicts our main contribution.