Table of Contents
Fetching ...

Social Behavior as a Key to Learning-based Multi-Agent Pathfinding Dilemmas

Chengyang He, Tanishq Duhan, Parth Tulsyan, Patrick Kim, Guillaume Sartoretti

TL;DR

SYLPH addresses symmetry-induced dilemmas in learning-based MAPF by introducing dynamic Social Value Orientation (SVO) as a tunable social behavior, learned per interaction partner to break policy homogeneity. The framework combines partner selection, SVO-based rewards, and SMP3O (a PPO variant) to train a hierarchical policy that integrates social preferences with traditional action decisions. An attention-based network and semantic transformer enable scalable, interpretable social signaling across agents, with a tie-breaking mechanism embedded in the policy rather than as post-processing. Empirical results across random, room-like, and maze maps, plus real-robot demonstrations, show SYLPH achieves superior coordination, reduced deadlocks, and competitive performance against state-of-the-art MAPF solvers, while maintaining scalability through parameter sharing.

Abstract

The Multi-agent Path Finding (MAPF) problem involves finding collision-free paths for a team of agents in a known, static environment, with important applications in warehouse automation, logistics, or last-mile delivery. To meet the needs of these large-scale applications, current learning-based methods often deploy the same fully trained, decentralized network to all agents to improve scalability. However, such parameter sharing typically results in homogeneous behaviors among agents, which may prevent agents from breaking ties around symmetric conflict (e.g., bottlenecks) and might lead to live-/deadlocks. In this paper, we propose SYLPH, a novel learning-based MAPF framework aimed to mitigate the adverse effects of homogeneity by allowing agents to learn and dynamically select different social behaviors (akin to individual, dynamic roles), without affecting the scalability offered by parameter sharing. Specifically, SYLPH agents learn to select their Social Value Orientation (SVO) given the situation at hand, quantifying their own level of selfishness/altruism, as well as an SVO-conditioned MAPF policy dictating their movement actions. To these ends, each agent first determines the most influential other agent in the system by predicting future conflicts/interactions with other agents. Each agent selects its own SVO towards that agent, and trains its decentralized MAPF policy to enact this SVO until another agent becomes more influential. To further allow agents to consider each others' social preferences, each agent gets access to the SVO value of their neighbors. As a result of this hierarchical decision-making and exchange of social preferences, SYLPH endows agents with the ability to reason about the MAPF task through more latent spaces and nuanced contexts, leading to varied responses that can help break ties around symmetric conflicts. [...]

Social Behavior as a Key to Learning-based Multi-Agent Pathfinding Dilemmas

TL;DR

SYLPH addresses symmetry-induced dilemmas in learning-based MAPF by introducing dynamic Social Value Orientation (SVO) as a tunable social behavior, learned per interaction partner to break policy homogeneity. The framework combines partner selection, SVO-based rewards, and SMP3O (a PPO variant) to train a hierarchical policy that integrates social preferences with traditional action decisions. An attention-based network and semantic transformer enable scalable, interpretable social signaling across agents, with a tie-breaking mechanism embedded in the policy rather than as post-processing. Empirical results across random, room-like, and maze maps, plus real-robot demonstrations, show SYLPH achieves superior coordination, reduced deadlocks, and competitive performance against state-of-the-art MAPF solvers, while maintaining scalability through parameter sharing.

Abstract

The Multi-agent Path Finding (MAPF) problem involves finding collision-free paths for a team of agents in a known, static environment, with important applications in warehouse automation, logistics, or last-mile delivery. To meet the needs of these large-scale applications, current learning-based methods often deploy the same fully trained, decentralized network to all agents to improve scalability. However, such parameter sharing typically results in homogeneous behaviors among agents, which may prevent agents from breaking ties around symmetric conflict (e.g., bottlenecks) and might lead to live-/deadlocks. In this paper, we propose SYLPH, a novel learning-based MAPF framework aimed to mitigate the adverse effects of homogeneity by allowing agents to learn and dynamically select different social behaviors (akin to individual, dynamic roles), without affecting the scalability offered by parameter sharing. Specifically, SYLPH agents learn to select their Social Value Orientation (SVO) given the situation at hand, quantifying their own level of selfishness/altruism, as well as an SVO-conditioned MAPF policy dictating their movement actions. To these ends, each agent first determines the most influential other agent in the system by predicting future conflicts/interactions with other agents. Each agent selects its own SVO towards that agent, and trains its decentralized MAPF policy to enact this SVO until another agent becomes more influential. To further allow agents to consider each others' social preferences, each agent gets access to the SVO value of their neighbors. As a result of this hierarchical decision-making and exchange of social preferences, SYLPH endows agents with the ability to reason about the MAPF task through more latent spaces and nuanced contexts, leading to varied responses that can help break ties around symmetric conflicts. [...]
Paper Structure (34 sections, 1 theorem, 21 equations, 13 figures, 5 tables, 3 algorithms)

This paper contains 34 sections, 1 theorem, 21 equations, 13 figures, 5 tables, 3 algorithms.

Key Result

Theorem 1

The function $f(x)$ is monotonically non-increasing $x$ for any valid values of $a$ and $b$.

Figures (13)

  • Figure 1: A simple example illustrates the difference between a completely selfish team and a team with diverse social roles. The two figures above show that when facing a symmetric challenge, a team of selfish agents falls into a social dilemma. In contrast, agents with different SVOs can more easily achieve cooperation by breaking the homogeneity of their behavior patterns. They benefit from a combination of individualism and pro-socialism within the team, as shown in the two figures below.
  • Figure 2: The key components and overall architecture of SYLPH. By introducing social preference into the MAPF framework as a temporary extension skill, the agent is equipped with social behavior to better cope with social dilemmas such as symmetry problems and blocking problems.
  • Figure 3: Overlap in optimal path flows between agents, caused by their varied starting and goal position configurations within the same map. In scenario (a), two agents traverse a narrow corridor moving in the same direction, which results in minimal conflict. Conversely, scenario (b) involves agents needing to navigate in opposite directions within the same space, significantly heightening the potential for conflict due to the direct opposition in their intended paths. According to Algorithm \ref{['algo_1']}, the calculated overlap in scenario (a) is markedly less than that in scenario (b). This distinction aligns well with intuitive expectations and the specific objectives of managing social dilemmas within multi-agent path finding. The larger overlap in scenario (b) suggests a higher degree of conflict and necessitates more critical intervention or strategy adjustment to avoid collision or deadlock, highlighting a situation of greater social distress.
  • Figure 4: The SVO-based tie-breaking mechanism example.
  • Figure 5: Overview of the Network of SYLPH.
  • ...and 8 more figures

Theorems & Definitions (3)

  • Remark
  • Theorem
  • proof