Table of Contents
Fetching ...

Decentralized Online Learning in General-Sum Stackelberg Games

Yaolong Yu, Haipeng Chen

TL;DR

This work investigates online, decentralized learning in repeated general-sum Stackelberg games with a leader–follower structure under two follower information regimes. It develops algorithms for both players and proves last-iterate convergence and sample complexity bounds, covering a myopic follower with limited information and a manipulative follower with side information (omniscient or noisy). The FBM and FMUCB strategies reveal intrinsic follower advantages when information about the leader’s rewards is available, highlighting that manipulation can outperform best-response dynamics in online settings. Empirical results on synthetic games corroborate the theoretical guarantees, demonstrating convergence and measurable gains for the follower under manipulation strategies. Overall, the paper advances understanding of how information asymmetry and strategic behavior affect learning dynamics in Stackelberg games and provides practical online-learning tools for decentralized settings.

Abstract

We study an online learning problem in general-sum Stackelberg games, where players act in a decentralized and strategic manner. We study two settings depending on the type of information for the follower: (1) the limited information setting where the follower only observes its own reward, and (2) the side information setting where the follower has extra side information about the leader's reward. We show that for the follower, myopically best responding to the leader's action is the best strategy for the limited information setting, but not necessarily so for the side information setting -- the follower can manipulate the leader's reward signals with strategic actions, and hence induce the leader's strategy to converge to an equilibrium that is better off for itself. Based on these insights, we study decentralized online learning for both players in the two settings. Our main contribution is to derive last-iterate convergence and sample complexity results in both settings. Notably, we design a new manipulation strategy for the follower in the latter setting, and show that it has an intrinsic advantage against the best response strategy. Our theories are also supported by empirical results.

Decentralized Online Learning in General-Sum Stackelberg Games

TL;DR

This work investigates online, decentralized learning in repeated general-sum Stackelberg games with a leader–follower structure under two follower information regimes. It develops algorithms for both players and proves last-iterate convergence and sample complexity bounds, covering a myopic follower with limited information and a manipulative follower with side information (omniscient or noisy). The FBM and FMUCB strategies reveal intrinsic follower advantages when information about the leader’s rewards is available, highlighting that manipulation can outperform best-response dynamics in online settings. Empirical results on synthetic games corroborate the theoretical guarantees, demonstrating convergence and measurable gains for the follower under manipulation strategies. Overall, the paper advances understanding of how information asymmetry and strategic behavior affect learning dynamics in Stackelberg games and provides practical online-learning tools for decentralized settings.

Abstract

We study an online learning problem in general-sum Stackelberg games, where players act in a decentralized and strategic manner. We study two settings depending on the type of information for the follower: (1) the limited information setting where the follower only observes its own reward, and (2) the side information setting where the follower has extra side information about the leader's reward. We show that for the follower, myopically best responding to the leader's action is the best strategy for the limited information setting, but not necessarily so for the side information setting -- the follower can manipulate the leader's reward signals with strategic actions, and hence induce the leader's strategy to converge to an equilibrium that is better off for itself. Based on these insights, we study decentralized online learning for both players in the two settings. Our main contribution is to derive last-iterate convergence and sample complexity results in both settings. Notably, we design a new manipulation strategy for the follower in the latter setting, and show that it has an intrinsic advantage against the best response strategy. Our theories are also supported by empirical results.
Paper Structure (25 sections, 15 theorems, 71 equations, 3 figures, 2 tables, 3 algorithms)

This paper contains 25 sections, 15 theorems, 71 equations, 3 figures, 2 tables, 3 algorithms.

Key Result

Proposition 1

In a repeated general-sum Stackelberg game, if the follower uses UCB as the learning algorithm, with probability at least $1-\delta$, the follower's regret is bounded as:

Figures (3)

  • Figure 1: Limited information
  • Figure 2: Omniscient follower
  • Figure 3: Noisy side information

Theorems & Definitions (20)

  • Proposition 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Proposition 2
  • Proposition 3
  • Theorem 4: Last iterate convergence of EXP3-FMUCB with noisy side information
  • Theorem 5: Last iterate convergence of UCBE-FMUCB with noisy side information
  • Theorem : Last iterate convergence of EXP3-UCB under limited information
  • Lemma 1
  • ...and 10 more