Table of Contents
Fetching ...

LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions

Andrew Hundt, Rumaisa Azeem, Masoumeh Mansouri, Martim Brandão

TL;DR

This work evaluates how open-vocabulary LLMs, when deployed on robots for human-robot interaction, can produce direct discrimination and unsafe behaviors. It introduces two evaluation frameworks: direct discrimination assessment using a broad set of protected-characteristic prompts, and a safety assessment with red-teaming prompts to diagnose harmful, illegal, or unsafe instructions. Across GPT-3.5, Mistral-7b, and Llama-3.1-8B, the study finds consistent discrimination against gender, disability, nationality, religion, and intersectional groups, as well as widespread safety failures where harmful prompts are deemed acceptable or feasible. The paper argues for rigorous risk assessments, operation-domain scoping, and governance to prevent unsafe LLM-enabled robotics, and it provides publicly available code to enable replication and further auditing. Overall, the findings highlight urgent safety and fairness considerations for the deployment of LLM-driven robots in real-world HRI contexts and advocate for responsible design and testing regimes before widespread adoption.

Abstract

Members of the Human-Robot Interaction (HRI) and Machine Learning (ML) communities have proposed Large Language Models (LLMs) as a promising resource for robotics tasks such as natural language interaction, household and workplace tasks, approximating 'common sense reasoning', and modeling humans. However, recent research has raised concerns about the potential for LLMs to produce discriminatory outcomes and unsafe behaviors in real-world robot experiments and applications. To assess whether such concerns are well placed in the context of HRI, we evaluate several highly-rated LLMs on discrimination and safety criteria. Our evaluation reveals that LLMs are currently unsafe for people across a diverse range of protected identity characteristics, including, but not limited to, race, gender, disability status, nationality, religion, and their intersections. Concretely, we show that LLMs produce directly discriminatory outcomes- e.g., 'gypsy' and 'mute' people are labeled untrustworthy, but not 'european' or 'able-bodied' people. We find various such examples of direct discrimination on HRI tasks such as facial expression, proxemics, security, rescue, and task assignment. Furthermore, we test models in settings with unconstrained natural language (open vocabulary) inputs, and find they fail to act safely, generating responses that accept dangerous, violent, or unlawful instructions-such as incident-causing misstatements, taking people's mobility aids, and sexual predation. Our results underscore the urgent need for systematic, routine, and comprehensive risk assessments and assurances to improve outcomes and ensure LLMs only operate on robots when it is safe, effective, and just to do so. We provide code to reproduce our experiments at https://github.com/rumaisa-azeem/llm-robots-discrimination-safety .

LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions

TL;DR

This work evaluates how open-vocabulary LLMs, when deployed on robots for human-robot interaction, can produce direct discrimination and unsafe behaviors. It introduces two evaluation frameworks: direct discrimination assessment using a broad set of protected-characteristic prompts, and a safety assessment with red-teaming prompts to diagnose harmful, illegal, or unsafe instructions. Across GPT-3.5, Mistral-7b, and Llama-3.1-8B, the study finds consistent discrimination against gender, disability, nationality, religion, and intersectional groups, as well as widespread safety failures where harmful prompts are deemed acceptable or feasible. The paper argues for rigorous risk assessments, operation-domain scoping, and governance to prevent unsafe LLM-enabled robotics, and it provides publicly available code to enable replication and further auditing. Overall, the findings highlight urgent safety and fairness considerations for the deployment of LLM-driven robots in real-world HRI contexts and advocate for responsible design and testing regimes before widespread adoption.

Abstract

Members of the Human-Robot Interaction (HRI) and Machine Learning (ML) communities have proposed Large Language Models (LLMs) as a promising resource for robotics tasks such as natural language interaction, household and workplace tasks, approximating 'common sense reasoning', and modeling humans. However, recent research has raised concerns about the potential for LLMs to produce discriminatory outcomes and unsafe behaviors in real-world robot experiments and applications. To assess whether such concerns are well placed in the context of HRI, we evaluate several highly-rated LLMs on discrimination and safety criteria. Our evaluation reveals that LLMs are currently unsafe for people across a diverse range of protected identity characteristics, including, but not limited to, race, gender, disability status, nationality, religion, and their intersections. Concretely, we show that LLMs produce directly discriminatory outcomes- e.g., 'gypsy' and 'mute' people are labeled untrustworthy, but not 'european' or 'able-bodied' people. We find various such examples of direct discrimination on HRI tasks such as facial expression, proxemics, security, rescue, and task assignment. Furthermore, we test models in settings with unconstrained natural language (open vocabulary) inputs, and find they fail to act safely, generating responses that accept dangerous, violent, or unlawful instructions-such as incident-causing misstatements, taking people's mobility aids, and sexual predation. Our results underscore the urgent need for systematic, routine, and comprehensive risk assessments and assurances to improve outcomes and ensure LLMs only operate on robots when it is safe, effective, and just to do so. We provide code to reproduce our experiments at https://github.com/rumaisa-azeem/llm-robots-discrimination-safety .
Paper Structure (77 sections, 1 equation, 27 figures, 8 tables)

This paper contains 77 sections, 1 equation, 27 figures, 8 tables.

Figures (27)

  • Figure 1: Summary of key findings with respect to selected LLM robot risks.
  • Figure 2: High-level concept for a task approval process based on a Figure corp. demo figure_2024_figure1_robot_ai_video.
  • Figure 3: Direct Discrimination flowchart depicting the processing workflow for Tasks in Sec. \ref{['sec:discrimination-method']} in Table \ref{['table:tasks']}.
  • Figure 4: Probability assigned by GPT3.5 (text-davinci-003) to a harmful action in various tasks when personal characteristics are part of the prompt. Rows are sorted by average probability across tasks. Highest probability of harmful actions: Mute, Gypsy, Blind, Paralyzed, ADHD, Child. The Collaboration Trust, Rescue, Expected Cleanliness, and Facial Expression tasks are among the largest regarding the quantity of harmful assignments generated.
  • Figure 5: Probability assigned by Mistral7b to a harmful action in various tasks when personal characteristics are part of the prompt. Rows are sorted by average probability across tasks. Highest probability of harmful actions: Black, Palestinian, Jewish, Gypsy, Muslim, Iraqi. Lowest probability of harmful actions: Japanese, Normal, European, Middle-aged, British, Nondisabled. The Facial Expression and Security tasks are among those with the most harmful assignments generated.
  • ...and 22 more figures