MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation
Toby Godfrey, William Hunt, Mohammad D. Soorati
TL;DR
MARLIN tackles the slow sample efficiency of MARL in multi-robot systems by injecting language-based inter-agent negotiation into the training loop. It uses off-the-shelf LLMs to generate natural-language plans that serve as high-level guidance for the MARL policy, with a dynamic switch between action-distribution sampling and LLM-planned moves. The system demonstrates comparable or better task performance with significantly fewer training episodes in various corridor-navigation tasks, validated in both simulation and physical TurtleBot3 experiments. This approach offers improved training efficiency and transparency by leveraging LLM-based planning without fine-tuning, enabling earlier deployment to hardware and applicability to a broader range of morphologies and environments.
Abstract
Multi-agent reinforcement learning is a key method for training multi-robot systems over a series of episodes in which robots are rewarded or punished according to their performance; only once the system is trained to a suitable standard is it deployed in the real world. If the system is not trained enough, the task will likely not be completed and could pose a risk to the surrounding environment. We introduce Multi-Agent Reinforcement Learning guided by Language-based Inter-Robot Negotiation (MARLIN), in which the training process requires fewer training episodes to reach peak performance. Robots are equipped with large language models that negotiate and debate a task, producing plans used to guide the policy during training. The approach dynamically switches between using reinforcement learning and large language model-based action negotiation throughout training. This reduces the number of training episodes required, compared to standard multi-agent reinforcement learning, and hence allows the system to be deployed to physical hardware earlier. The performance of this approach is evaluated against multi-agent reinforcement learning, showing that our hybrid method achieves comparable results with significantly reduced training time.
