Table of Contents
Fetching ...

Achieving Pareto Optimality in Games via Single-bit Feedback

Seref Taha Kiremitci, Ahmed Said Donmez, Muhammed O. Sayin

TL;DR

Coordination in multi-agent systems under severe communication constraints is challenging. The authors introduce SBC-PE, a fully decentralized explore-then-commit mechanism that uses a single-bit signal per agent per round to maximize the social welfare $W(a)=\sum_{i=1}^n w_i\,u_i(a)$ in arbitrary finite games. Key contributions include (i) a simple, state-free protocol that requires only one-bit communication, (ii) finite-time guarantees with $\mathbb{E}[R_T]=O(\log T)$ and an explicit exploration length $K=\tfrac{\log(4MT\xi^2)}{2\xi^2}$, and (iii) rigorous regret analysis corroborated by simulations showing scalability and robustness. The work demonstrates that scalable welfare optimization is achievable under minimal communication, with convergence to the exact Pareto-optimal joint action in finite time.

Abstract

Efficient coordination in multi-agent systems often incurs high communication overhead or slow convergence rates, making scalable welfare optimization difficult. We propose Single-Bit Coordination Dynamics for Pareto-Efficient Outcomes (SBC-PE), a decentralized learning algorithm requiring only a single-bit satisfaction signal per agent each round. Despite this extreme efficiency, SBC-PE guarantees convergence to the exact optimal solution in arbitrary finite games. We establish explicit regret bounds, showing expected regret grows only logarithmically with the horizon, i.e., O(log T). Compared with prior payoff-based or bandit-style rules, SBC-PE uniquely combines minimal signaling, general applicability, and finite-time guarantees. These results show scalable welfare optimization is achievable under minimal communication constraints.

Achieving Pareto Optimality in Games via Single-bit Feedback

TL;DR

Coordination in multi-agent systems under severe communication constraints is challenging. The authors introduce SBC-PE, a fully decentralized explore-then-commit mechanism that uses a single-bit signal per agent per round to maximize the social welfare in arbitrary finite games. Key contributions include (i) a simple, state-free protocol that requires only one-bit communication, (ii) finite-time guarantees with and an explicit exploration length , and (iii) rigorous regret analysis corroborated by simulations showing scalability and robustness. The work demonstrates that scalable welfare optimization is achievable under minimal communication, with convergence to the exact Pareto-optimal joint action in finite time.

Abstract

Efficient coordination in multi-agent systems often incurs high communication overhead or slow convergence rates, making scalable welfare optimization difficult. We propose Single-Bit Coordination Dynamics for Pareto-Efficient Outcomes (SBC-PE), a decentralized learning algorithm requiring only a single-bit satisfaction signal per agent each round. Despite this extreme efficiency, SBC-PE guarantees convergence to the exact optimal solution in arbitrary finite games. We establish explicit regret bounds, showing expected regret grows only logarithmically with the horizon, i.e., O(log T). Compared with prior payoff-based or bandit-style rules, SBC-PE uniquely combines minimal signaling, general applicability, and finite-time guarantees. These results show scalable welfare optimization is achievable under minimal communication constraints.

Paper Structure

This paper contains 7 sections, 22 equations, 2 figures, 1 algorithm.

Figures (2)

  • Figure 1: Performance of Algorithm \ref{['alg:sbc-pe']} for $n=10$ agents. The x-axis is the exploration length $K$ (log scale), and the y-axis the total utility. Solid lines depict the committed joint action $\bar{a}$, dashed lines the most-frequently-played content-endorsed joint action, and the black dotted line the optimal utility.
  • Figure 2: Limit values of $\epsilon$ as a function of the welfare gap $\Delta_1$ for $n=10$ agents and two actions. The blue curve indicates the mean of the limiting $\epsilon$ values, while the orange curve shows the minimum acceptable $\epsilon$.