To Analyze and Regulate Human-in-the-loop Learning for Congestion Games
Hongbo Li, Lingjie Duan
TL;DR
The paper addresses inefficiencies in congestion games arising from selfish, time-varying routing decisions augmented by crowdsourced learning. By formulating the problem as POMDPs for both myopic and socially optimal policies, it identifies threshold-based exploration behavior and derives a PoA lower bound that grows with the discount factor $\rho$. The core contribution is a Selective Information Disclosure (SID) mechanism that reveals latest latency information only when users would over-explore stochastic paths, bounding the PoA to at most $\frac{1}{1-\frac{\rho^{1/\lambda}}{2}}$ (i.e., $\le 2$). The authors extend the analysis to general linear path graphs with time-varying Markov dynamics and validate the approach on real traffic data, showing SID achieves close-to-optimal performance (within about $20\%$ of the optimum) while significantly outperforming baseline information-sharing or myopic policies, implying substantial practical impact for real-world mobile crowdsourcing in traffic management.
Abstract
In congestion games, selfish users behave myopically to crowd to the shortest paths, and the social planner designs mechanisms to regulate such selfish routing through information or payment incentives. However, such mechanism design requires the knowledge of time-varying traffic conditions and it is the users themselves to learn and report past road experiences to the social planner (e.g., Waze or Google Maps). When congestion games meet mobile crowdsourcing, it is critical to incentivize selfish users to explore non-shortest paths in the best exploitation-exploration trade-off. First, we consider a simple but fundamental parallel routing network with one deterministic path and multiple stochastic paths for users with an average arrival probability $λ$. We prove that the current myopic routing policy (widely used in Waze and Google Maps) misses both exploration (when strong hazard belief) and exploitation (when weak hazard belief) as compared to the social optimum. Due to the myopic policy's under-exploration, we prove that the caused price of anarchy (PoA) is larger than \(\frac{1}{1-ρ^{\frac{1}λ}}\), which can be arbitrarily large as discount factor \(ρ\rightarrow1\). To mitigate such huge efficiency loss, we propose a novel selective information disclosure (SID) mechanism: we only reveal the latest traffic information to users when they intend to over-explore stochastic paths upon arrival, while hiding such information when they want to under-explore. We prove that our mechanism successfully reduces PoA to be less than~\(2\). Besides the parallel routing network, we further extend our mechanism and PoA results to any linear path graphs with multiple intermediate nodes.
