Safe Multi-agent Reinforcement Learning with Natural Language Constraints

Ziyan Wang; Meng Fang; Tristan Tomilin; Fei Fang; Yali Du

Safe Multi-agent Reinforcement Learning with Natural Language Constraints

Ziyan Wang, Meng Fang, Tristan Tomilin, Fei Fang, Yali Du

TL;DR

This work tackles safe multi-agent reinforcement learning when constraints are supplied as natural language. It introduces SMALL, a pipeline that uses fine-tuned language models to summarize constraints, a cost-learning module to produce constraint embeddings and per-agent costs, and PPO-based multi-agent policy updates with a Lagrangian term to enforce safety without ground-truth costs. The LaMaSafe benchmark provides grid-world and 3D tasks with free-form NL constraints to evaluate safety and coordination. Empirically, SMALL attains rewards comparable to standard MARL while substantially reducing constraint violations, demonstrating effective NL understanding and enforcement and offering a scalable path toward safer, more accessible MARL in real-world domains.

Abstract

The role of natural language constraints in Safe Multi-agent Reinforcement Learning (MARL) is crucial, yet often overlooked. While Safe MARL has vast potential, especially in fields like robotics and autonomous vehicles, its full potential is limited by the need to define constraints in pre-designed mathematical terms, which requires extensive domain expertise and reinforcement learning knowledge, hindering its broader adoption. To address this limitation and make Safe MARL more accessible and adaptable, we propose a novel approach named Safe Multi-agent Reinforcement Learning with Natural Language constraints (SMALL). Our method leverages fine-tuned language models to interpret and process free-form textual constraints, converting them into semantic embeddings that capture the essence of prohibited states and behaviours. These embeddings are then integrated into the multi-agent policy learning process, enabling agents to learn policies that minimize constraint violations while optimizing rewards. To evaluate the effectiveness of SMALL, we introduce the LaMaSafe, a multi-task benchmark designed to assess the performance of multiple agents in adhering to natural language constraints. Empirical evaluations across various environments demonstrate that SMALL achieves comparable rewards and significantly fewer constraint violations, highlighting its effectiveness in understanding and enforcing natural language constraints.

Safe Multi-agent Reinforcement Learning with Natural Language Constraints

TL;DR

Abstract

Paper Structure (30 sections, 7 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 30 sections, 7 equations, 7 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Preliminaries
Methodology
Language Constrained Markov Game
Cost Learning Module
Multi-Agent Policy Learning with Constraints
LaMaSafe Benchmark
Experiments
Setup
Main Results
Ablation Study
Ablation on SMALL components.
Conclusion
Broader Impact Statement
...and 15 more sections

Figures (7)

Figure 1: The framework of the SMALL. Initially, humans will create natural language constraints for the environment and agents. Firstly, SMALL uses the decoder language model to condense the semantic meaning of the nature of human instruction and eliminate ambiguity and redundancy. Secondly, the encoder Language Model encodes the condensed constraints and environment description from the text-based observations into embeddings $E_l$ and ${E}^i_{o,t}$ according to their semantic meaning. Lastly, the cost prediction model uses those embeddings as input and predicts the constraint is violated (predicted the cost $\hat{c}^n_t$ for each agent). In the end, the policy network will update using the prediction cost and the embeddings.
Figure 2: LaMaSafe Benchmark. (a) Grid: two agents in Random layout, including 20 randomly placed lava, water and grass. (b)Goal(Ant): two agents in the Hard level layout, in which each agent controls the four joints of an ant to navigate. The numbers behind indicate the obstacle count, in which "H" and "V" represent hazards and vases, respectively. The task's difficulty level increases with the number of hazards and vases. (c) Examples of natural language constraints employed in our evaluation. (d) Examples of environmental descriptions provided by the environments.
Figure 3: Comparison in Natural Language Constraints: We conducted a comparison of the performance of four different algorithms, namely MAPPO, HAPPO, SMALL-MAPPO, and SMALL-HAPPO in LaMASafe-Grid and LaMASafe-Goal. The evaluation was based on rewards and costs across different types of agents and layouts. It is important to note that the comparison of all algorithms only takes into account natural language constraints. To ensure a fair comparison, we augmented the embedding $E_l$ to the state for MAPPO and HAPPO.
Figure 4: (a) Four Agent Comparison: SMALL with MAPPO and HAPPO on the Easy, Hard level of LaMaSafe-Goal(Ant) involving four agents. (b) Ground Truth Cost Comparison: SMALL with MAPPO-Lagrange and HAPPO-Lagrange on the Hard level of Goal(Ant) with four agents.
Figure 5: LaMasafe-Grid, (a) Two agents in Random layout, size 10 by 10, including 20 randomly placed lava, water and grass. (b) Two agents in One-Path layout,
...and 2 more figures

Safe Multi-agent Reinforcement Learning with Natural Language Constraints

TL;DR

Abstract

Safe Multi-agent Reinforcement Learning with Natural Language Constraints

Authors

TL;DR

Abstract

Table of Contents

Figures (7)