Table of Contents
Fetching ...

Follower Agnostic Methods for Stackelberg Games

Chinmay Maheshwari, James Cheng, S. Shankar Sasty, Lillian Ratliff, Eric Mazumdar

TL;DR

An efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner, using a unique gradient estimator, leveraging specially designed strategies to probe followers.

Abstract

In this paper, we present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner. Unlike previous works, our approach works even when leader has no knowledge about the followers' utility functions or strategy space. Our algorithm introduces a unique gradient estimator, leveraging specially designed strategies to probe followers. In a departure from traditional assumptions of optimal play, we model followers' responses using a convergent adaptation rule, allowing for realistic and dynamic interactions. The leader constructs the gradient estimator solely based on observations of followers' actions. We provide both non-asymptotic convergence rates to stationary points of the leader's objective and demonstrate asymptotic convergence to a \emph{local Stackelberg equilibrium}. To validate the effectiveness of our algorithm, we use this algorithm to solve the problem of incentive design on a large-scale transportation network, showcasing its robustness even when the leader lacks access to followers' demand.

Follower Agnostic Methods for Stackelberg Games

TL;DR

An efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner, using a unique gradient estimator, leveraging specially designed strategies to probe followers.

Abstract

In this paper, we present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner. Unlike previous works, our approach works even when leader has no knowledge about the followers' utility functions or strategy space. Our algorithm introduces a unique gradient estimator, leveraging specially designed strategies to probe followers. In a departure from traditional assumptions of optimal play, we model followers' responses using a convergent adaptation rule, allowing for realistic and dynamic interactions. The leader constructs the gradient estimator solely based on observations of followers' actions. We provide both non-asymptotic convergence rates to stationary points of the leader's objective and demonstrate asymptotic convergence to a \emph{local Stackelberg equilibrium}. To validate the effectiveness of our algorithm, we use this algorithm to solve the problem of incentive design on a large-scale transportation network, showcasing its robustness even when the leader lacks access to followers' demand.
Paper Structure (20 sections, 13 theorems, 78 equations, 2 figures, 1 algorithm)

This paper contains 20 sections, 13 theorems, 78 equations, 2 figures, 1 algorithm.

Key Result

Theorem 1

Let Assumption assm: BasicAssumptionSetup-assm: FollowerUpdatesConvergence hold. If we choose $\eta_t = \bar{\eta} (t+1)^{-1/2}d^{-1}, \delta_t = \bar{\delta} (t+1)^{-1/4}d^{-1/2}$ such that $\bar{\eta}\leq d/2\tilde{\ell}$. Then the updates $(x_t)_{t\in[T]}$ in Algorithm alg: ZerothOrderTwoPointAlg where $\alpha = CK^{-\lambda}$ if Assumption assm: FollowerUpdatesConvergence(1a) hold, or $\alpha

Figures (2)

  • Figure 1: Schematic depiction of Sioux Falls transportation network network. The numbers on the edges and nodes are identifiers.
  • Figure 2: The evolution of planners objective function with iterates of the algorithm. The shaded blue region denotes the confidence interval calculated over 12 runs.

Theorems & Definitions (17)

  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Remark 4
  • Lemma 1
  • Lemma 2
  • Theorem 2
  • ...and 7 more