To Analyze and Regulate Human-in-the-loop Learning for Congestion Games

Hongbo Li; Lingjie Duan

To Analyze and Regulate Human-in-the-loop Learning for Congestion Games

Hongbo Li, Lingjie Duan

TL;DR

The paper addresses inefficiencies in congestion games arising from selfish, time-varying routing decisions augmented by crowdsourced learning. By formulating the problem as POMDPs for both myopic and socially optimal policies, it identifies threshold-based exploration behavior and derives a PoA lower bound that grows with the discount factor $\rho$. The core contribution is a Selective Information Disclosure (SID) mechanism that reveals latest latency information only when users would over-explore stochastic paths, bounding the PoA to at most $\frac{1}{1-\frac{\rho^{1/\lambda}}{2}}$ (i.e., $\le 2$). The authors extend the analysis to general linear path graphs with time-varying Markov dynamics and validate the approach on real traffic data, showing SID achieves close-to-optimal performance (within about $20\%$ of the optimum) while significantly outperforming baseline information-sharing or myopic policies, implying substantial practical impact for real-world mobile crowdsourcing in traffic management.

Abstract

In congestion games, selfish users behave myopically to crowd to the shortest paths, and the social planner designs mechanisms to regulate such selfish routing through information or payment incentives. However, such mechanism design requires the knowledge of time-varying traffic conditions and it is the users themselves to learn and report past road experiences to the social planner (e.g., Waze or Google Maps). When congestion games meet mobile crowdsourcing, it is critical to incentivize selfish users to explore non-shortest paths in the best exploitation-exploration trade-off. First, we consider a simple but fundamental parallel routing network with one deterministic path and multiple stochastic paths for users with an average arrival probability $λ$. We prove that the current myopic routing policy (widely used in Waze and Google Maps) misses both exploration (when strong hazard belief) and exploitation (when weak hazard belief) as compared to the social optimum. Due to the myopic policy's under-exploration, we prove that the caused price of anarchy (PoA) is larger than $\frac{1}{1-ρ^{\frac{1}λ}}$, which can be arbitrarily large as discount factor $ρ\rightarrow1$. To mitigate such huge efficiency loss, we propose a novel selective information disclosure (SID) mechanism: we only reveal the latest traffic information to users when they intend to over-explore stochastic paths upon arrival, while hiding such information when they want to under-explore. We prove that our mechanism successfully reduces PoA to be less than~$2$. Besides the parallel routing network, we further extend our mechanism and PoA results to any linear path graphs with multiple intermediate nodes.

To Analyze and Regulate Human-in-the-loop Learning for Congestion Games

TL;DR

. The core contribution is a Selective Information Disclosure (SID) mechanism that reveals latest latency information only when users would over-explore stochastic paths, bounding the PoA to at most

(i.e.,

). The authors extend the analysis to general linear path graphs with time-varying Markov dynamics and validate the approach on real traffic data, showing SID achieves close-to-optimal performance (within about

of the optimum) while significantly outperforming baseline information-sharing or myopic policies, implying substantial practical impact for real-world mobile crowdsourcing in traffic management.

Abstract

. We prove that the current myopic routing policy (widely used in Waze and Google Maps) misses both exploration (when strong hazard belief) and exploitation (when weak hazard belief) as compared to the social optimum. Due to the myopic policy's under-exploration, we prove that the caused price of anarchy (PoA) is larger than

, which can be arbitrarily large as discount factor

. To mitigate such huge efficiency loss, we propose a novel selective information disclosure (SID) mechanism: we only reveal the latest traffic information to users when they intend to over-explore stochastic paths upon arrival, while hiding such information when they want to under-explore. We prove that our mechanism successfully reduces PoA to be less than~

. Besides the parallel routing network, we further extend our mechanism and PoA results to any linear path graphs with multiple intermediate nodes.

Paper Structure (31 sections, 10 theorems, 75 equations, 8 figures, 1 table)

This paper contains 31 sections, 10 theorems, 75 equations, 8 figures, 1 table.

Introduction
System Model
Dynamic Congestion Model
Crowdsourcing Model for Learning
POMDP Problem Formulations for Myopic and Socially Optimal Policies
Problem Formulation for Myopic Policy
Socially Optimal Policy Problem Formulation
Comparing Myopic Policy to Social Optimum for PoA Analysis
Selective Information Disclosure
Extensions to General Linear Path Graphs and Dynamic Markov Chains
Extensions of System Model
Analysis of Myopic and Socially Optimal Policies
SID Mechanism Design and Analysis
Experiment Validation Using Real Datasets
Conclusion
...and 16 more sections

Key Result

Lemma 1

The cost functions $C^{(m)}(\mathbf{L}(t), \mathbf{x}(t),s(t))$ in (cost_Cm) and $C^*(\mathbf{L}(t), \mathbf{x}(t),s(t))$ in (cost_C*) under both policies increase with any path's expected latency $\mathbb{E}[\ell_i(t)|x_i(t-1),y_i(t-1)]$ in $\mathbf{L}(t)$ and $\mathbf{x}(t)$ in (LX_set).

Figures (8)

Figure 1: At the beginning of each time slot $t\in\{1,2,\cdots\}$, a user arrives with an average arrival probability $\lambda$ to choose a path among $N+1$ paths in the transportation network in Fig. \ref{['fig:congestion_game']}. The current travel latency $\ell_i(t)$ of each path $i\in\{0,1 ...,N\}$ has linear correlation with last latency $\ell_i(t-1)$ and evolves according to current user choice in (\ref{['L_0(t+1)']}) and (\ref{['L_i(t+1)']}). Path 0 is a safe route and its latency has a fixed correlation coefficient $\alpha\in(0,1)$ to change from the last round. Yet any risky path $i\in\{1,\cdots,N\}$ has a stochastic correlation coefficient $\alpha_i(t)$, which alternates between low coefficient state $\alpha_L^i\in[0,1)$ and high state $\alpha^i_H\geq 1$ according to the partially observable Markov chain in Fig. \ref{['fig:POMDP']}.
Figure 2: The socially optimal policy's exploration threshold $\ell_1^*(t)$ and myopic policy's threshold $\ell^{(m)}(t)$ versus hazard belief $x_1(t)$ in a two-path transportation network with $N=1$. We set $\alpha=0.6,\alpha_H^1=1.2,\alpha_L^1=0.2,q_{LL}^1=0.5,q_{HH}^1=0.5,\Delta\ell=2,p_H=0.8,p_L=0.3$ and $\ell_0(t)=10$ at current time $t$.
Figure 3: Average inefficiency ratios $\gamma^{(m)}$ under myopic policy in (\ref{['cost_Cm']}), $\gamma^{(\emptyset)}$ under hiding policy in (\ref{['pi^empty']}), and $\gamma^{(\text{SID})}$ under our SID mechanism.
Figure 5: Average inefficiency ratios $\gamma^{(m)}$ under the myopic policy and $\gamma^{(\text{SID})}$ under our SID mechanism in the linear path graph. We vary the maximum variation $\sigma$ of transition probabilities in set $\{0,0.04,0.08,0.12,1.16,0.20\}$. Here $q_{HH}(t)$ and $q_{LL}(t)$ satisfy uniform distributions on intervals $[\max\{0,q_H-\sigma\},\min\{1,q_H+\sigma\}]$ and $[\max\{0,q_L-\sigma\},\min\{1,q_L+\sigma\}]$, respectively.
Figure 6: A typical linear path network from Shanghai Station to Shanghai Tower, passing through the Bund as an intermediate node, includes several route options. From Shanghai Station to the Bund, travelers can choose between Haining Road, North Henan Road, and Middle Henan Road (in blue), or North-South Elevated Road and Yan'An Elevated Road (in red). From the Bund to Shanghai Tower, the options include Yan'An Road Tunnel and Middle Yincheng Road (in purple), or Renmin Road Tunnel (in black).
...and 3 more figures

Theorems & Definitions (12)

Lemma 1
Proposition 1
Lemma 2
Proposition 2
Proposition 3
Proposition 4
Definition 1: Selective Information Disclosure (SID) Mechanism
Theorem 1
Corollary 1
Definition 2: Linear Path graph Broersma1989pathgraph
...and 2 more

To Analyze and Regulate Human-in-the-loop Learning for Congestion Games

TL;DR

Abstract

To Analyze and Regulate Human-in-the-loop Learning for Congestion Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (12)