Table of Contents
Fetching ...

No-Regret Learning in Stackelberg Games with an Application to Electric Ride-Hailing

Anna Maddux, Marko Maljkovic, Nikolas Geroliminis, Maryam Kamgarpour

TL;DR

This work addresses learning in single-leader multi-follower Stackelberg games when the lower-level game is unknown and viewed as a black box. It introduces a no-regret algorithm that leverages Gaussian process regression under RKHS regularity to converge to an $ε$-Stackelberg equilibrium in $O(√T)$ rounds, without requiring private follower utilities. The method accommodates approximate Nash responses and uses bandit feedback, providing theoretical guarantees and practical viability. A numerical study in electric ride-hailing pricing demonstrates robustness to lower-level approximation errors and validates the approach in a realistic setting.

Abstract

We consider the problem of efficiently learning to play single-leader multi-follower Stackelberg games when the leader lacks knowledge of the lower-level game. Such games arise in hierarchical decision-making problems involving self-interested agents. For example, in electric ride-hailing markets, a central authority aims to learn optimal charging prices to shape fleet distributions and charging patterns of ride-hailing companies. Existing works typically apply gradient-based methods to find the leader's optimal strategy. Such methods are impractical as they require that the followers share private utility information with the leader. Instead, we treat the lower-level game as a black box, assuming only that the followers' interactions approximate a Nash equilibrium while the leader observes the realized cost of the resulting approximation. Under kernel-based regularity assumptions on the leader's cost function, we develop a no-regret algorithm that converges to an $ε$-Stackelberg equilibrium in $O(\sqrt{T})$ rounds. Finally, we validate our approach through a numerical case study on optimal pricing in electric ride-hailing markets.

No-Regret Learning in Stackelberg Games with an Application to Electric Ride-Hailing

TL;DR

This work addresses learning in single-leader multi-follower Stackelberg games when the lower-level game is unknown and viewed as a black box. It introduces a no-regret algorithm that leverages Gaussian process regression under RKHS regularity to converge to an -Stackelberg equilibrium in rounds, without requiring private follower utilities. The method accommodates approximate Nash responses and uses bandit feedback, providing theoretical guarantees and practical viability. A numerical study in electric ride-hailing pricing demonstrates robustness to lower-level approximation errors and validates the approach in a realistic setting.

Abstract

We consider the problem of efficiently learning to play single-leader multi-follower Stackelberg games when the leader lacks knowledge of the lower-level game. Such games arise in hierarchical decision-making problems involving self-interested agents. For example, in electric ride-hailing markets, a central authority aims to learn optimal charging prices to shape fleet distributions and charging patterns of ride-hailing companies. Existing works typically apply gradient-based methods to find the leader's optimal strategy. Such methods are impractical as they require that the followers share private utility information with the leader. Instead, we treat the lower-level game as a black box, assuming only that the followers' interactions approximate a Nash equilibrium while the leader observes the realized cost of the resulting approximation. Under kernel-based regularity assumptions on the leader's cost function, we develop a no-regret algorithm that converges to an -Stackelberg equilibrium in rounds. Finally, we validate our approach through a numerical case study on optimal pricing in electric ride-hailing markets.

Paper Structure

This paper contains 9 sections, 1 theorem, 23 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let Assumptions ass:concave_game - ass:cost_in_RKHS hold, set $\epsilon,\delta\in(0,1)$, and set $\beta^t$ equal to $B+(2L_JM/\sigma^2)\sqrt{2(\gamma^{t-1}+1+\log(2/\delta))}$, where the maximum information gain$\gamma^{t-1}$ is a kernel-dependent quantity defined in Srinivas2010.

Figures (2)

  • Figure 1: Illustration of the setup with 2 districts and 3 ride-hailing companies.
  • Figure 2: The top figure illustrates the average cumulative regret of the regulatory authority, $R^t/t$, while the bottom figure displays the leader’s objective, both over $T = 25$ iterations. The initial $N_{\text{warm}} = 5$ iterations correspond to a warm-up phase, during which pricing vectors are selected randomly in order to collect data for calibrating hyperparameters of the GP. Different colors represent varying levels of approximation error in computing the Nash equilibrium within the inner loop of Algorithm \ref{['alg:bilevel_game']}.

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Theorem 1
  • proof