Table of Contents
Fetching ...

SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

Qian Long, Fangwei Zhong, Mingdong Wu, Yizhou Wang, Song-Chun Zhu

TL;DR

SocialGFs introduce a data-driven, gradient-based representation for multi-agent reinforcement learning by learning social gradient fields offline through denoising score matching. This representation captures environmental, inter-agent, and intrinsic forces, enabling transfer across tasks and populations and addressing sparse rewards via gradient-enhanced shaping. The approach integrates with MAPPO and demonstrates superior performance and adaptability in grassland and cooperative navigation benchmarks, along with qualitative demonstrations of GF-guided behavior. By separating representation learning from policy learning, SocialGFs offer scalable, transferable generalization for dynamic MAS deployments with potential impact on autonomous systems like vehicles and robots.

Abstract

Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's intrinsic motivation, referring to the social force. Inspired by this concept, we propose a novel gradient-based state representation for multi-agent reinforcement learning. To non-trivially model the social forces, we further introduce a data-driven method, where we employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples, e.g., the attractive or repulsive outcomes of each force. During interactions, the agents take actions based on the multi-dimensional gradients to maximize their own rewards. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO. The empirical results reveal that SocialGFs offer four advantages for multi-agent systems: 1) they can be learned without requiring online interaction, 2) they demonstrate transferability across diverse tasks, 3) they facilitate credit assignment in challenging reward settings, and 4) they are scalable with the increasing number of agents.

SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

TL;DR

SocialGFs introduce a data-driven, gradient-based representation for multi-agent reinforcement learning by learning social gradient fields offline through denoising score matching. This representation captures environmental, inter-agent, and intrinsic forces, enabling transfer across tasks and populations and addressing sparse rewards via gradient-enhanced shaping. The approach integrates with MAPPO and demonstrates superior performance and adaptability in grassland and cooperative navigation benchmarks, along with qualitative demonstrations of GF-guided behavior. By separating representation learning from policy learning, SocialGFs offer scalable, transferable generalization for dynamic MAS deployments with potential impact on autonomous systems like vehicles and robots.

Abstract

Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's intrinsic motivation, referring to the social force. Inspired by this concept, we propose a novel gradient-based state representation for multi-agent reinforcement learning. To non-trivially model the social forces, we further introduce a data-driven method, where we employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples, e.g., the attractive or repulsive outcomes of each force. During interactions, the agents take actions based on the multi-dimensional gradients to maximize their own rewards. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO. The empirical results reveal that SocialGFs offer four advantages for multi-agent systems: 1) they can be learned without requiring online interaction, 2) they demonstrate transferability across diverse tasks, 3) they facilitate credit assignment in challenging reward settings, and 4) they are scalable with the increasing number of agents.
Paper Structure (29 sections, 5 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 29 sections, 5 equations, 9 figures, 5 tables, 2 algorithms.

Figures (9)

  • Figure 1: Learning Gradient Fields from Examples for Multi-agent Systems. We use a score matching function to train each example, obtaining different $gf$ functions. For various tasks, we select different sets of $gf$ and apply them to the observation to generate a gf-based representation (SocialGFs). We then apply RL methods to train the adaptive agent based on that representation. By employing different $gf$ functions for representation, the agent can adapt to various scenarios.
  • Figure 2: This is an example of the social forces that affect sheep and wolves in a grassland. The red arrows represent the forces from wolves that repel sheep, while the gray arrows represent the forces from obstacles that prevent both sheep and wolves from escaping. The green arrow represents the force from the grass that attracts sheep, and the blue arrows represent the forces from sheep that attract wolves.
  • Figure 3: The offline examples that we used for learning gradient fields. The different colors of the balls indicate that they belong to different classes.
  • Figure 4: The visualization of learned Gradient fields for sheep in grassland game. Red and green circles indicate the wolves and the grass respectively.
  • Figure 5: Four games used in the experiments. Figure (a) shows the grassland game where sheep work together to collect grass and avoid wolves, while the wolves work cooperatively to eat sheep. Figure (b)(c)(d) are three varieties of cooperative navigation games where agents need to cooperatively reach different landmarks.
  • ...and 4 more figures