Regret Minimization in Stackelberg Games with Side Information
Keegan Harris, Zhiwei Steven Wu, Maria-Florina Balcan
TL;DR
This work extends online learning in Stackelberg games to settings with side information by formalizing Stackelberg games with context and follower types. It proves a fundamental impossibility result: no-regret learning is unattainable when both contexts and follower types are chosen adversarially. To enable learning, the paper introduces a discretization of the leader policy space into finite, context-dependent sets and analyzes two natural relaxations: stochastic followers with adversarial contexts, and stochastic contexts with adversarial followers, providing regret guarantees via greedy estimation and Hedge over policies. It further extends to bandit feedback using barycentric spanners to construct low-variance estimators, achieving tilde{O}(T^{2/3})-style regret in the bandit setting. Through simulations, the proposed methods outperform non-contextual baselines, showcasing practical impact for security, wildlife protection, and related applications where side information is available.
Abstract
Algorithms for playing in Stackelberg games have been deployed in real-world domains including airport security, anti-poaching efforts, and cyber-crime prevention. However, these algorithms often fail to take into consideration the additional information available to each player (e.g. traffic patterns, weather conditions, network congestion), which may significantly affect both players' optimal strategies. We formalize such settings as Stackelberg games with side information, in which both players observe an external context before playing. The leader commits to a (context-dependent) strategy, and the follower best-responds to both the leader's strategy and the context. We focus on the online setting in which a sequence of followers arrive over time, and the context may change from round-to-round. In sharp contrast to the non-contextual version, we show that it is impossible for the leader to achieve no-regret in the full adversarial setting. Motivated by this result, we show that no-regret learning is possible in two natural relaxations: the setting in which the sequence of followers is chosen stochastically and the sequence of contexts is adversarial, and the setting in which contexts are stochastic and follower types are adversarial.
