Table of Contents
Fetching ...

MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation

Toby Godfrey, William Hunt, Mohammad D. Soorati

TL;DR

MARLIN tackles the slow sample efficiency of MARL in multi-robot systems by injecting language-based inter-agent negotiation into the training loop. It uses off-the-shelf LLMs to generate natural-language plans that serve as high-level guidance for the MARL policy, with a dynamic switch between action-distribution sampling and LLM-planned moves. The system demonstrates comparable or better task performance with significantly fewer training episodes in various corridor-navigation tasks, validated in both simulation and physical TurtleBot3 experiments. This approach offers improved training efficiency and transparency by leveraging LLM-based planning without fine-tuning, enabling earlier deployment to hardware and applicability to a broader range of morphologies and environments.

Abstract

Multi-agent reinforcement learning is a key method for training multi-robot systems over a series of episodes in which robots are rewarded or punished according to their performance; only once the system is trained to a suitable standard is it deployed in the real world. If the system is not trained enough, the task will likely not be completed and could pose a risk to the surrounding environment. We introduce Multi-Agent Reinforcement Learning guided by Language-based Inter-Robot Negotiation (MARLIN), in which the training process requires fewer training episodes to reach peak performance. Robots are equipped with large language models that negotiate and debate a task, producing plans used to guide the policy during training. The approach dynamically switches between using reinforcement learning and large language model-based action negotiation throughout training. This reduces the number of training episodes required, compared to standard multi-agent reinforcement learning, and hence allows the system to be deployed to physical hardware earlier. The performance of this approach is evaluated against multi-agent reinforcement learning, showing that our hybrid method achieves comparable results with significantly reduced training time.

MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation

TL;DR

MARLIN tackles the slow sample efficiency of MARL in multi-robot systems by injecting language-based inter-agent negotiation into the training loop. It uses off-the-shelf LLMs to generate natural-language plans that serve as high-level guidance for the MARL policy, with a dynamic switch between action-distribution sampling and LLM-planned moves. The system demonstrates comparable or better task performance with significantly fewer training episodes in various corridor-navigation tasks, validated in both simulation and physical TurtleBot3 experiments. This approach offers improved training efficiency and transparency by leveraging LLM-based planning without fine-tuning, enabling earlier deployment to hardware and applicability to a broader range of morphologies and environments.

Abstract

Multi-agent reinforcement learning is a key method for training multi-robot systems over a series of episodes in which robots are rewarded or punished according to their performance; only once the system is trained to a suitable standard is it deployed in the real world. If the system is not trained enough, the task will likely not be completed and could pose a risk to the surrounding environment. We introduce Multi-Agent Reinforcement Learning guided by Language-based Inter-Robot Negotiation (MARLIN), in which the training process requires fewer training episodes to reach peak performance. Robots are equipped with large language models that negotiate and debate a task, producing plans used to guide the policy during training. The approach dynamically switches between using reinforcement learning and large language model-based action negotiation throughout training. This reduces the number of training episodes required, compared to standard multi-agent reinforcement learning, and hence allows the system to be deployed to physical hardware earlier. The performance of this approach is evaluated against multi-agent reinforcement learning, showing that our hybrid method achieves comparable results with significantly reduced training time.

Paper Structure

This paper contains 8 sections, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Diagrams of the scenarios used for evaluation; (a) Asymmetrical Two Slot Corridor, (b) Symmetrical Two Slot Corridor, (c) Single Slot Corridor, (d) Two Path Corridor, (e) Maze Like Corridor
  • Figure 2: A diagram of the inter-agent negotiation mechanism. Both agents share the same system prompt, then alternate in suggesting a Top-Level Plan (TLP) and moves for both agents. Critiques can be provided to improve moves until they are agreed by the pair. Moves are then simulated and the next moves are discussed.
  • Figure 3: Median performance of the MARLIN and MARL systems for different scenarios in simulation. The boxplot shows the distribution of performance scores for trials using the LLM-based negotiation.
  • Figure 4: The environment and robot platform used for the physical robot experiments in the Maze-Like Corridor. Results are reported in \ref{['fig-real-perf']}.
  • Figure 5: Median performance of the system for the Maze-Like Corridor when executed on physical hardware. The boxplot shows the performance distribution when the LLM-only system was evaluated on physical hardware.
  • ...and 1 more figures