Table of Contents
Fetching ...

Language Grounded Multi-agent Reinforcement Learning with Human-interpretable Communication

Huao Li, Hossein Nourkhiz Mahjoub, Behdad Chalaki, Vaishnav Tadiparthi, Kwonjoon Lee, Ehsan Moradi-Pari, Charles Michael Lewis, Katia P Sycara

TL;DR

A novel computational pipeline is proposed that aligns the communication space between MARL agents with an embedding space of human natural language by grounding agent communications on synthetic data generated by embodied Large Language Models in interactive teamwork scenarios and demonstrates that introducing language grounding not only maintains task performance but also accelerates the emergence of communication.

Abstract

Multi-Agent Reinforcement Learning (MARL) methods have shown promise in enabling agents to learn a shared communication protocol from scratch and accomplish challenging team tasks. However, the learned language is usually not interpretable to humans or other agents not co-trained together, limiting its applicability in ad-hoc teamwork scenarios. In this work, we propose a novel computational pipeline that aligns the communication space between MARL agents with an embedding space of human natural language by grounding agent communications on synthetic data generated by embodied Large Language Models (LLMs) in interactive teamwork scenarios. Our results demonstrate that introducing language grounding not only maintains task performance but also accelerates the emergence of communication. Furthermore, the learned communication protocols exhibit zero-shot generalization capabilities in ad-hoc teamwork scenarios with unseen teammates and novel task states. This work presents a significant step toward enabling effective communication and collaboration between artificial agents and humans in real-world teamwork settings.

Language Grounded Multi-agent Reinforcement Learning with Human-interpretable Communication

TL;DR

A novel computational pipeline is proposed that aligns the communication space between MARL agents with an embedding space of human natural language by grounding agent communications on synthetic data generated by embodied Large Language Models in interactive teamwork scenarios and demonstrates that introducing language grounding not only maintains task performance but also accelerates the emergence of communication.

Abstract

Multi-Agent Reinforcement Learning (MARL) methods have shown promise in enabling agents to learn a shared communication protocol from scratch and accomplish challenging team tasks. However, the learned language is usually not interpretable to humans or other agents not co-trained together, limiting its applicability in ad-hoc teamwork scenarios. In this work, we propose a novel computational pipeline that aligns the communication space between MARL agents with an embedding space of human natural language by grounding agent communications on synthetic data generated by embodied Large Language Models (LLMs) in interactive teamwork scenarios. Our results demonstrate that introducing language grounding not only maintains task performance but also accelerates the emergence of communication. Furthermore, the learned communication protocols exhibit zero-shot generalization capabilities in ad-hoc teamwork scenarios with unseen teammates and novel task states. This work presents a significant step toward enabling effective communication and collaboration between artificial agents and humans in real-world teamwork settings.
Paper Structure (35 sections, 4 equations, 11 figures, 5 tables)

This paper contains 35 sections, 4 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Illustrations of our proposed computational pipeline , LangGround. The framework consists of three modules: 1) collecting grounded communication from LLM agents, 2) aligning MARL communication with language grounds, 2) translating aligned communication vectors into natural language messages via cosine similarity matching.
  • Figure 1: Similarity gain w/ LangGround
  • Figure 2: Learning curves of LangGround in comparison with baseline methods. The y-axis is task performance measured by the episode length until task completion, which is lower the better. The x-axis is the number of training timestamps. Shaded areas are standard errors over three random seeds.
  • Figure 3: Learned communication embedding space. Communication vectors between agents in $pp_{v0}$ are visualized with t-SNE and clustered with DBSCAN. Two semantically meaningful clusters are identified as examples, each corresponding to a specific agent observation. We also present the most similar reference message from dataset $\mathcal{D}$ to illustrate the alignment between the agent communication space and the human language embedding space.
  • Figure 4: Team performance of LangGround and baselines on Predator Prey with 10 by 10 map and vision range of 1.
  • ...and 6 more figures