Table of Contents
Fetching ...

Overcoming the Price of Anarchy by Steering with Recommendations

Cesare Carissimo, Marcin Korecki, Damian Dailisan

TL;DR

This work addresses how the Price of Anarchy can be mitigated in networks where agents learn and adapt by introducing a route recommender that steers learning dynamics in a Braess-like congestion game. It formalizes the Learning Dynamic Manipulation Problem (LDMP), modeling recommendations as observable states for $Q$-learners and optimizing a KL-divergence-based welfare objective toward the social optimum. The authors prove that enlarging the recommendation space increases the steering potential and demonstrate via simulations that optimized, time-varying recommendations can drive large populations toward near-optimal social welfare, while constant recommendations fail to steer. The results reveal a fundamental trade-off between welfare and alignment: more aggressive steering can yield better social outcomes but may reduce user trust, and they discuss ethical implications and future work toward scalable, robust, and responsible deployment in real systems.

Abstract

Varied real world systems such as transportation networks, supply chains and energy grids present coordination problems where many agents must learn to share resources. It is well known that the independent and selfish interactions of agents in these systems may lead to inefficiencies, often referred to as the `Price of Anarchy'. Effective interventions that reduce the Price of Anarchy while preserving individual autonomy are of great interest. In this paper we explore recommender systems as one such intervention mechanism. We start with the Braess Paradox, a congestion game model of a routing problem related to traffic on roads, packets on the internet, and electricity on power grids. Following recent literature, we model the interactions of agents as a repeated game between $Q$-learners, a common type of reinforcement learning agents. This work introduces the Learning Dynamic Manipulation Problem, where an external recommender system can strategically trigger behavior by picking the states observed by $Q$-learners during learning. Our computational contribution demonstrates that appropriately chosen recommendations can robustly steer the system towards convergence to the social optimum, even for many players. Our theoretical and empirical results highlight that increases in the recommendation space can increase the steering potential of a recommender system, which should be considered in the design of recommender systems.

Overcoming the Price of Anarchy by Steering with Recommendations

TL;DR

This work addresses how the Price of Anarchy can be mitigated in networks where agents learn and adapt by introducing a route recommender that steers learning dynamics in a Braess-like congestion game. It formalizes the Learning Dynamic Manipulation Problem (LDMP), modeling recommendations as observable states for -learners and optimizing a KL-divergence-based welfare objective toward the social optimum. The authors prove that enlarging the recommendation space increases the steering potential and demonstrate via simulations that optimized, time-varying recommendations can drive large populations toward near-optimal social welfare, while constant recommendations fail to steer. The results reveal a fundamental trade-off between welfare and alignment: more aggressive steering can yield better social outcomes but may reduce user trust, and they discuss ethical implications and future work toward scalable, robust, and responsible deployment in real systems.

Abstract

Varied real world systems such as transportation networks, supply chains and energy grids present coordination problems where many agents must learn to share resources. It is well known that the independent and selfish interactions of agents in these systems may lead to inefficiencies, often referred to as the `Price of Anarchy'. Effective interventions that reduce the Price of Anarchy while preserving individual autonomy are of great interest. In this paper we explore recommender systems as one such intervention mechanism. We start with the Braess Paradox, a congestion game model of a routing problem related to traffic on roads, packets on the internet, and electricity on power grids. Following recent literature, we model the interactions of agents as a repeated game between -learners, a common type of reinforcement learning agents. This work introduces the Learning Dynamic Manipulation Problem, where an external recommender system can strategically trigger behavior by picking the states observed by -learners during learning. Our computational contribution demonstrates that appropriately chosen recommendations can robustly steer the system towards convergence to the social optimum, even for many players. Our theoretical and empirical results highlight that increases in the recommendation space can increase the steering potential of a recommender system, which should be considered in the design of recommender systems.

Paper Structure

This paper contains 39 sections, 2 theorems, 10 equations, 12 figures, 1 table, 1 algorithm.

Key Result

Theorem 3.2

(Increasing Reachability) When increasing the size of the recommendation space $m$ which amounts to adding rows to a $q$-table ${\bm{Q}}$, the size of the reachable set $R({\bm{Q}})$ is monotonically non-decreasing.

Figures (12)

  • Figure 1: Graphical depiction of the main innovation and contribution: steering a multi-agent system of $Q$-learners with a recommender system towards a system optimum.
  • Figure 2: Illustration of the initial network (a), and the augmented network (b) in the Braess Paradox. Agents start in the "S" state and pick a path to reach state "t". The numbers represent the cost of traveling over a link. A cost of $x$ is the ratio of agents that choose that link. Two actions are possible in (a), $\mathit{up}$ takes the upper edges, and $\mathit{down}$ takes the lower edges. In (b) an additional action $\mathit{cross}$ is possible, which takes the first upper edge, crosses to the lower section at the middle, and finishes on the second lower edge. Rational and fully-informed agents all pick the crossing link in the augmented network (Nash equilibrium), which leads to high congestion and the worst possible social welfare.
  • Figure 3: Learning of 100 $\epsilon$-greedy tabular $Q$-learners ($\alpha=0.1$, $\gamma=0.8$) in the Braess Paradox converges to social welfare values much higher than the NE (0). Values of social welfare were rescaled from $[-2,-1.5]\rightarrow[0,1]$, higher social welfare is better. Results replicated from carissimo2024counter.
  • Figure 4: As each user learns a $q$-table of recommendation--action pairs, the recommender chooses the row of the $q$-table by recommending $s_i$ which determines the row to which the policy $\pi_i$ is applied. Time subscripts are omitted for clarity. This is a particular case where the number of actions equals the number of recommendations ($k=m$).
  • Figure 5: Top: Social welfare achieved in Braess's Paradox while varying, the numbers of $Q$-learners, the size of the recommendation space, and the type of recommender: optimized, random, and none. Botttom: Evolution of the social welfare for four select conditions. Values of social welfare were rescaled from $[-2,-1.5]\rightarrow[0,1]$, higher social welfare is better.
  • ...and 7 more figures

Theorems & Definitions (6)

  • Definition 3.1
  • Theorem 3.2
  • Theorem 3.4
  • Definition 3.5
  • proof
  • proof