Learning in Structured Stackelberg Games
Maria-Florina Balcan, Kiriaki Fragkia, Keegan Harris
TL;DR
This work introduces structured Stackelberg games where contextual information predicts the follower's type and provides a complete learnability theory for both online and distributional settings. The authors define the Stackelberg-Littlestone dimension (SLdim) to capture the joint complexity of the hypothesis class and the Stackelberg payoff structure, and present the Stackelberg Standard Optimal Algorithm (SSOA) that achieves instance-optimal regret $\text{SLdim}_{\mathcal{G}}(\mathcal{H})$ in the online setting. For distributional learning, they define the gamma-valued SN and SG dimensions to establish matching lower and upper bounds on sample complexity, giving a PAC-style learner $\mathfrak{L}^*$ with performance guarantees that scale with these dimensions. The results reveal that SLdim can be strictly smaller than the classical Littlestone dimension, enabling learnability where traditional multiclass tools fail, and they connect these ideas to broader settings such as auctions with side information and Bayesian persuasion. Overall, the paper provides a principled framework and provable algorithms for learning in structured Stackelberg environments with contextual information, with implications for security, AI safety, and related economic settings.
Abstract
We initiate the study of structured Stackelberg games, a novel form of strategic interaction between a leader and a follower where contextual information can be predictive of the follower's (unknown) type. Motivated by applications such as security games and AI safety, we show how this additional structure can help the leader learn a utility-maximizing policy in both the online and distributional settings. In the online setting, we first prove that standard learning-theoretic measures of complexity do not characterize the difficulty of the leader's learning task. Notably, we find that there exists a learning-theoretic measure of complexity, analogous to the Littlestone dimension in online classification, that tightly characterizes the leader's instance-optimal regret. We term this the Stackelberg-Littlestone dimension, and leverage it to provide a provably optimal online learning algorithm. In the distributional setting, we provide analogous results by showing that two new dimensions control the sample complexity upper- and lower-bound.
