An Organizationally-Oriented Approach to Enhancing Explainability and Control in Multi-Agent Reinforcement Learning
Julien Soulé, Jean-Paul Jamont, Michel Occello, Louis-Marie Traonouez, Paul Théron
TL;DR
The paper tackles the challenge of explainability and control in multi-agent reinforcement learning by embedding the explicit organizational model $ M O I S E^+$ into the MARL framework via MOISE+MARL. It introduces TEMM, a trajectory-based unsupervised method to infer implicit roles, goals, and obligations from observed behaviors and to quantify organizational fit. Empirical results across four Dec-POMDP environments show that enforcing MOISE^+ constraints improves organizational fit, convergence, and robustness, with clear advantages over AGR+MARL and across multiple MARL algorithms. The approach offers a modular, externally-guided mechanism to shape cooperative behavior, enabling more predictable and auditable multi-agent systems, while outlining practical limitations and directions for dynamic adaptation and automation of organizational specifications.
Abstract
Multi-Agent Reinforcement Learning can lead to the development of collaborative agent behaviors that show similarities with organizational concepts. Pushing forward this perspective, we introduce a novel framework that explicitly incorporates organizational roles and goals from the $\mathcal{M}OISE^+$ model into the MARL process, guiding agents to satisfy corresponding organizational constraints. By structuring training with roles and goals, we aim to enhance both the explainability and control of agent behaviors at the organizational level, whereas much of the literature primarily focuses on individual agents. Additionally, our framework includes a post-training analysis method to infer implicit roles and goals, offering insights into emergent agent behaviors. This framework has been applied across various MARL environments and algorithms, demonstrating coherence between predefined organizational specifications and those inferred from trained agents.
