Human-in-the-loop Learning for Dynamic Congestion Games

Hongbo Li; Lingjie Duan

Human-in-the-loop Learning for Dynamic Congestion Games

Hongbo Li, Lingjie Duan

TL;DR

This work proposes a new combined hiding and probabilistic recommendation (CHAR) mechanism to hide all information from a selected user group and provide state-dependent probabilistic recommendations to the other user group and proves that the PoA results remain unchanged.

Abstract

Today mobile users learn and share their traffic observations via crowdsourcing platforms (e.g., Waze). Yet such platforms simply cater to selfish users' myopic interests to recommend the shortest path, and do not encourage enough users to travel and learn other paths for future others. Prior studies focus on one-shot congestion games without considering users' information learning, while our work studies how users learn and alter traffic conditions on stochastic paths in a human-in-the-loop manner. Our analysis shows that the myopic routing policy leads to severe under-exploration of stochastic paths. This results in a price of anarchy (PoA) greater than $2$, as compared to the socially optimal policy in minimizing the long-term social cost. Besides, the myopic policy fails to ensure the correct learning convergence about users' traffic hazard beliefs. To address this, we focus on informational (non-monetary) mechanisms as they are easier to implement than pricing. We first show that existing information-hiding mechanisms and deterministic path-recommendation mechanisms in Bayesian persuasion literature do not work with even (\text{PoA}=\infty). Accordingly, we propose a new combined hiding and probabilistic recommendation (CHAR) mechanism to hide all information from a selected user group and provide state-dependent probabilistic recommendations to the other user group. Our CHAR successfully ensures PoA less than (\frac{5}{4}), which cannot be further reduced by any other informational (non-monetary) mechanism. Besides the parallel network, we further extend our analysis and CHAR to more general linear path graphs with multiple intermediate nodes, and we prove that the PoA results remain unchanged. Additionally, we carry out experiments with real-world datasets to further extend our routing graphs and verify the close-to-optimal performance of our CHAR.

Human-in-the-loop Learning for Dynamic Congestion Games

TL;DR

Abstract

, as compared to the socially optimal policy in minimizing the long-term social cost. Besides, the myopic policy fails to ensure the correct learning convergence about users' traffic hazard beliefs. To address this, we focus on informational (non-monetary) mechanisms as they are easier to implement than pricing. We first show that existing information-hiding mechanisms and deterministic path-recommendation mechanisms in Bayesian persuasion literature do not work with even (\text{PoA}=\infty). Accordingly, we propose a new combined hiding and probabilistic recommendation (CHAR) mechanism to hide all information from a selected user group and provide state-dependent probabilistic recommendations to the other user group. Our CHAR successfully ensures PoA less than (\frac{5}{4}), which cannot be further reduced by any other informational (non-monetary) mechanism. Besides the parallel network, we further extend our analysis and CHAR to more general linear path graphs with multiple intermediate nodes, and we prove that the PoA results remain unchanged. Additionally, we carry out experiments with real-world datasets to further extend our routing graphs and verify the close-to-optimal performance of our CHAR.

Paper Structure (36 sections, 13 theorems, 68 equations, 8 figures, 1 table)

This paper contains 36 sections, 13 theorems, 68 equations, 8 figures, 1 table.

Introduction
System Model
Dynamic Congestion Model
Human-in-the-loop Learning Model
Problem Formulations for Myopic and Socially Optimal Policies
Problem Formulation for the Myopic Policy
Problem Formulation for Socially Optimal Policy
Policies Comparison via PoA Analysis
Exploration and Exploitation Comparison
PoA Analysis
CHAR Mechanism with Learning Convergence
Learning Convergence Analysis
Benchmark Informational Mechanisms Comparison
New CHAR Mechanism Design and Analysis
Extensions of Parallel Transportation Network to Linear Path Graphs
...and 21 more sections

Key Result

Lemma 1

Under the myopic policy, given expected travel latency ${\mathbb{E}[\ell_1(t)|x_1'(t-1)]}$ and hazard belief ${x_1(t)}$ of stochastic path 1, the exploration number is:

Figures (8)

Figure 1: Dynamic congestion model: within each time slot $t\in\{0,1,\cdots\}$, a random number $N(t)$ of users arrive at origin O to decide among safe path 0 and any stochastic path ${i\in\mathcal{M}:=\{1,\cdots, M\}}$ to travel to destination D.
Figure 2: Theoretical results of the myopic policy and the socially optimal policy in Section \ref{['section4']}.
Figure 3: Exploration numbers $n_1^*(t)$ under the socially optimal policy and $n_1^{(m)}(t)$ under the myopic policy versus hazard belief $x_1(t)$ in an illustrative two-path transportation network with $M=1$.
Figure 4: Dynamics of average hazard belief $x_1(t)$ under both myopic and socially optimal policies from $t=0$ to $30$ in a two-path network with $M=1$.
Figure 5: Generalization from the parallel graph in Fig. \ref{['fig:congestion_game']} to a linear path graph: At the beginning of each time slot $t\in\{1,2,\cdots\}$, $N_j(t)$ users arrive at any node $\text{D}_j\in\{\text{O},\text{D}_1,\cdots, \text{D}_k\}$ selects one path from $M+1$ available paths to travel to the next node in this linear path graph. Among all the $M+1$ paths between intermediate nodes $\text{D}_{j}$ and $\text{D}_{j+1}$, where $j\in\{0,\cdots,k\}$, path $0^j$ is safe while any path $i^j\in\{1^j,\cdots,M^j\}$ is stochastic.
...and 3 more figures

Theorems & Definitions (17)

Definition 1: Myopic Policy
Lemma 1
Lemma 2
Lemma 3
Definition 2: Under/over exploration
Proposition 1
Theorem 1
Lemma 4
Proposition 2
Definition 3: Informational mechanisms
...and 7 more

Human-in-the-loop Learning for Dynamic Congestion Games

TL;DR

Abstract

Human-in-the-loop Learning for Dynamic Congestion Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (17)