Combining Planning and Reinforcement Learning for Solving Relational Multiagent Domains
Nikhilesh Prabhakar, Ranveer Singh, Harsha Kokel, Sriraam Natarajan, Prasad Tadepalli
TL;DR
MaRePReL tackles the sample inefficiency and non-stationarity of relational multiagent reinforcement learning by integrating a relational hierarchical planner as a centralized controller with task-specific state abstractions and low-level deep RL. The problem is formalized as a goal-directed relational Markov game (GRMG), solved through a planner-driven task distributor and operator-specific RL policies, guided by dynamic D-FOCI abstractions. The work presents the first relational multiagent system that generalizes across different numbers of objects and relations, demonstrates a cohesive architecture combining planning, abstraction, and learning, and shows superior sample efficiency, transfer, and generalization across three relational domains. It also notes practical limitations and points to future directions in scaling, partial observability, and differentiable end-to-end implementations.
Abstract
Multiagent Reinforcement Learning (MARL) poses significant challenges due to the exponential growth of state and action spaces and the non-stationary nature of multiagent environments. This results in notable sample inefficiency and hinders generalization across diverse tasks. The complexity is further pronounced in relational settings, where domain knowledge is crucial but often underutilized by existing MARL algorithms. To overcome these hurdles, we propose integrating relational planners as centralized controllers with efficient state abstractions and reinforcement learning. This approach proves to be sample-efficient and facilitates effective task transfer and generalization.
