Table of Contents
Fetching ...

Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction

Mitchell Rosser, Marc. G Carmichael

TL;DR

Multiple collaborative AI systems were tested against a single independent AI agent to determine whether the success in other domains would translate into improved human-robot interaction performance, and shows that there is no defined trend between the number of agents and the success of the model.

Abstract

With the recent development of natural language generation models - termed as large language models (LLMs) - a potential use case has opened up to improve the way that humans interact with robot assistants. These LLMs should be able to leverage their large breadth of understanding to interpret natural language commands into effective, task appropriate and safe robot task executions. However, in reality, these models suffer from hallucinations, which may cause safety issues or deviations from the task. In other domains, these issues have been improved through the use of collaborative AI systems where multiple LLM agents can work together to collectively plan, code and self-check outputs. In this research, multiple collaborative AI systems were tested against a single independent AI agent to determine whether the success in other domains would translate into improved human-robot interaction performance. The results show that there is no defined trend between the number of agents and the success of the model. However, it is clear that some collaborative AI agent architectures can exhibit a greatly improved capacity to produce error-free code and to solve abstract problems.

Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction

TL;DR

Multiple collaborative AI systems were tested against a single independent AI agent to determine whether the success in other domains would translate into improved human-robot interaction performance, and shows that there is no defined trend between the number of agents and the success of the model.

Abstract

With the recent development of natural language generation models - termed as large language models (LLMs) - a potential use case has opened up to improve the way that humans interact with robot assistants. These LLMs should be able to leverage their large breadth of understanding to interpret natural language commands into effective, task appropriate and safe robot task executions. However, in reality, these models suffer from hallucinations, which may cause safety issues or deviations from the task. In other domains, these issues have been improved through the use of collaborative AI systems where multiple LLM agents can work together to collectively plan, code and self-check outputs. In this research, multiple collaborative AI systems were tested against a single independent AI agent to determine whether the success in other domains would translate into improved human-robot interaction performance. The results show that there is no defined trend between the number of agents and the success of the model. However, it is clear that some collaborative AI agent architectures can exhibit a greatly improved capacity to produce error-free code and to solve abstract problems.

Paper Structure

This paper contains 13 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Experimental Environment. Quadrupedal robot is shown in environment with fiducial markers representing objects forming the task context
  • Figure 2: Architecture of Multi-Agent Cooperating AI system. The writer and safeguard collaborate to produce code as mediated by a group chat manager wu_autogen_2023.
  • Figure 3: Testing Flowchart. Trial prompts are provided to LLM agent configs to produce executable Python Code which is then run on a Boston Dynamics Spot Robot in front of a unique independent observer
  • Figure 4: Python code output from configuration B on Trial 1. Code demonstrates the system's ability to comprehend instructions, generate appropriate conditional logic and perform calculations to direct actions
  • Figure 5: AI system architecture diagram for configs A, B and C. Each system receives a prompt which is then processed by a number of AIs, self-determining when to send a final product as output.
  • ...and 5 more figures