Table of Contents
Fetching ...

Distributed Online Planning for Min-Max Problems in Networked Markov Games

Alexandros E. Tzikas, Jinkyoo Park, Mykel J. Kochenderfer, Ross E. Allen

TL;DR

This work tackles distributed multi-agent Markov games with an egalitarian objective by proposing a two-phase algorithm that combines online planning with distributed convex optimization. Each agent uses planning to build a concave local return approximation of its neighborhood's future reward and then participates in a distributed subgradient method to pick the next joint action, under neighborhood-based rewards and network communication constraints. The approach is demonstrated on formation control tasks, showing robust improvements over baselines and competitive performance against open-loop optimal solutions, while highlighting practical challenges like oscillations and computational load. The framework provides a principled pathway to scalable, networked, fair decision-making in multi-robot systems, with avenues for theoretical convergence guarantees and efficiency enhancements.

Abstract

Min-max problems are important in multi-agent sequential decision-making because they improve the performance of the worst-performing agent in the network. However, solving the multi-agent min-max problem is challenging. We propose a modular, distributed, online planning-based algorithm that is able to approximate the solution of the min-max objective in networked Markov games, assuming that the agents communicate within a network topology and the transition and reward functions are neighborhood-dependent. This set-up is encountered in the multi-robot setting. Our method consists of two phases at every planning step. In the first phase, each agent obtains sample returns based on its local reward function, by performing online planning. Using the samples from online planning, each agent constructs a concave approximation of its underlying local return as a function of only the action of its neighborhood at the next planning step. In the second phase, the agents deploy a distributed optimization framework that converges to the optimal immediate next action for each agent, based on the function approximations of the first phase. We demonstrate our algorithm's performance through formation control simulations.

Distributed Online Planning for Min-Max Problems in Networked Markov Games

TL;DR

This work tackles distributed multi-agent Markov games with an egalitarian objective by proposing a two-phase algorithm that combines online planning with distributed convex optimization. Each agent uses planning to build a concave local return approximation of its neighborhood's future reward and then participates in a distributed subgradient method to pick the next joint action, under neighborhood-based rewards and network communication constraints. The approach is demonstrated on formation control tasks, showing robust improvements over baselines and competitive performance against open-loop optimal solutions, while highlighting practical challenges like oscillations and computational load. The framework provides a principled pathway to scalable, networked, fair decision-making in multi-robot systems, with avenues for theoretical convergence guarantees and efficiency enhancements.

Abstract

Min-max problems are important in multi-agent sequential decision-making because they improve the performance of the worst-performing agent in the network. However, solving the multi-agent min-max problem is challenging. We propose a modular, distributed, online planning-based algorithm that is able to approximate the solution of the min-max objective in networked Markov games, assuming that the agents communicate within a network topology and the transition and reward functions are neighborhood-dependent. This set-up is encountered in the multi-robot setting. Our method consists of two phases at every planning step. In the first phase, each agent obtains sample returns based on its local reward function, by performing online planning. Using the samples from online planning, each agent constructs a concave approximation of its underlying local return as a function of only the action of its neighborhood at the next planning step. In the second phase, the agents deploy a distributed optimization framework that converges to the optimal immediate next action for each agent, based on the function approximations of the first phase. We demonstrate our algorithm's performance through formation control simulations.
Paper Structure (18 sections, 1 theorem, 19 equations, 3 figures, 1 algorithm)

This paper contains 18 sections, 1 theorem, 19 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

For a problem of the form (eq:main_problem), the principle of dynamic programming does not necessarily hold.

Figures (3)

  • Figure 1: Instantaneous reward of the worst-performing agent for $\mathcal{G}_1$. Five agents communicate over an almost fully connected communication network (only agent 1 is not a neighbor of agent 5). We observe that our proposed method performs better than the baselines on the max-min criterion.
  • Figure 2: Instantaneous reward of the worst-performing agent for a switching topology of five agents. Every $10$ timesteps the topology changes between $\mathcal{G}_1$, an almost fully connected topology, and $\mathcal{G}_2$, which is a cyclic graph. We observe that our proposed method performs better than the baselines on the max-min criterion.
  • Figure 3: Instantaneous reward of the worst-performing agent for $\mathcal{G}_3$. Eight agents communicate over the topology 1-2-3-4-5-6-7-8. The worst agent cumulative reward in our method is -1188, while the worst agent return in the POMCPOW baseline is -2338. We observe that our proposed method performs better than the baselines with respect to the max-min criterion.

Theorems & Definitions (3)

  • Definition 1
  • Theorem 1
  • proof