Table of Contents
Fetching ...

Socially-Weighted Alignment: A Game-Theoretic Framework for Multi-Agent LLM Systems

Furkan Mumcu, Yasin Yilmaz

TL;DR

SWA is proposed, a game-theoretic framework that modifies inference-time decision making by interpolating between an agent's private objective and an estimate of group welfare via a social weight so that it induces a critical threshold above which agents no longer have marginal incentive to increase demand under overload.

Abstract

Deploying large language model (LLM) agents in shared environments introduces a fundamental tension between individual alignment and collective stability: locally rational decisions can impose negative externalities that degrade system-level performance. We propose Socially-Weighted Alignment (SWA), a game-theoretic framework that modifies inference-time decision making by interpolating between an agent's private objective and an estimate of group welfare via a social weight $λ\in[0,1]$. In a shared-resource congestion game with $n$ agents and congestion severity $β$, we show that SWA induces a critical threshold $λ^*=(n-β)/(n-1)$ above which agents no longer have marginal incentive to increase demand under overload, yielding a phase transition from persistent congestion to stable operation near capacity. We further provide an inference-time algorithmic instantiation of SWA that does not require parameter updates or multi-agent reinforcement learning, and use a multi-agent simulation to empirically validate the predicted threshold behavior.

Socially-Weighted Alignment: A Game-Theoretic Framework for Multi-Agent LLM Systems

TL;DR

SWA is proposed, a game-theoretic framework that modifies inference-time decision making by interpolating between an agent's private objective and an estimate of group welfare via a social weight so that it induces a critical threshold above which agents no longer have marginal incentive to increase demand under overload.

Abstract

Deploying large language model (LLM) agents in shared environments introduces a fundamental tension between individual alignment and collective stability: locally rational decisions can impose negative externalities that degrade system-level performance. We propose Socially-Weighted Alignment (SWA), a game-theoretic framework that modifies inference-time decision making by interpolating between an agent's private objective and an estimate of group welfare via a social weight . In a shared-resource congestion game with agents and congestion severity , we show that SWA induces a critical threshold above which agents no longer have marginal incentive to increase demand under overload, yielding a phase transition from persistent congestion to stable operation near capacity. We further provide an inference-time algorithmic instantiation of SWA that does not require parameter updates or multi-agent reinforcement learning, and use a multi-agent simulation to empirically validate the predicted threshold behavior.
Paper Structure (26 sections, 2 theorems, 26 equations, 4 figures)

This paper contains 26 sections, 2 theorems, 26 equations, 4 figures.

Key Result

Proposition 1

If agents are purely self-interested ($\lambda=0$) and the penalty is sufficiently diluted such that $\beta<n$, then any best response increases consumption whenever $X>C$. In particular, if $x_{\max}<\infty$, the Nash equilibrium is maximal feasible consumption ($x_i=x_{\max}$ for all $i$); if $x_{

Figures (4)

  • Figure 1: Overload rate versus social alignment. Overload rate (OR) as a function of the social alignment coefficient $\lambda$ in the $n=5$ congestion environment with capacity $C=20$ and congestion severity $\beta=1.6$, evaluated across five base language models. OR is the fraction of timesteps in a $T=20$ step episode for which the aggregate load exceeds capacity, $X_t>C$. Consistent with Theorem 1, overload remains high for small $\lambda$ and declines sharply as $\lambda$ approaches the predicted threshold $\lambda^*=(n-\beta)/(n-1)=0.85$, with residual differences across models attributable to stochastic candidate generation and finite candidate sets.
  • Figure 2: Welfare improvement relative to baseline. Change in episode-average realized welfare relative to the $\lambda=0$ baseline, $\Delta\overline{W}(\lambda)=\overline{W}(\lambda)-\overline{W}(0)$, for the same setting as Figure 1. Realized welfare at time $t$ is $W_t=\sum_{i=1}^{n} r_{i,t}=X_t-\beta\max(0,X_t-C)$ and $\overline{W}(\lambda)=\frac{1}{T}\sum_{t=1}^{T}W_t$ with $T=20$. Across models, welfare improves most strongly in the same range of $\lambda$ where overload collapses, indicating that SWI reduces congestion losses while maintaining high utilization near capacity rather than inducing uniformly low demand.
  • Figure 3: Ablation on congestion severity: overload transition shifts with $\beta$. Overload rate (OR) as a function of $\lambda$ for Microsoft Phi-3.5-mini-instruct under two congestion severities, $\beta\in\{1.6,3\}$, with $n=5$, $C=20$, and $T=20$ fixed. Vertical dashed lines mark the theoretical thresholds from Theorem 1, $\lambda^*=(n-\beta)/(n-1)$, yielding $\lambda^*=0.85$ for $\beta=1.6$ and $\lambda^*=0.5$ for $\beta=3$; shaded regions indicate the predicted stable side $\lambda\ge\lambda^*$. Empirically, the onset of near-zero overload occurs at substantially lower $\lambda$ for larger $\beta$, consistent with the predicted shift in the transition.
  • Figure 4: Ablation on congestion severity: larger $\beta$ amplifies welfare losses under overload. Episode-average realized welfare for Microsoft Phi-3.5-mini-instruct under $\beta\in\{1.6,3\}$ as a function of $\lambda$ in the same environment as Figure 3 ($n=5$, $C=20$, $T=20$). Realized welfare is $W_t=\sum_{i=1}^{n} r_{i,t}=X_t-\beta\max(0,X_t-C)$ and is averaged over timesteps within an episode. When $\lambda$ is below the corresponding threshold, overload is frequent and welfare is substantially lower for $\beta=3$ than for $\beta=1.6$, reflecting the stronger congestion penalty. For $\lambda\ge\lambda^*$, overload is eliminated and welfare recovers to the near-capacity regime.

Theorems & Definitions (6)

  • Definition 1: SWA social utility
  • Definition 2: SWA Nash equilibrium
  • Proposition 1: Tragedy of the Commons (Baseline)
  • proof
  • Theorem 1: Stability Condition under SWA (Average Welfare)
  • proof