The Sample Complexity of Stackelberg Games
Francesco Bacchiocchi, Matteo Bollini, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
TL;DR
The paper addresses learning an optimal leader commitment in Stackelberg games when follower payoffs are unknown, proposing Learn-Optimal-Commitment, a novel algorithm that does not rely on restrictive assumptions and carefully accounts for the bit-precision of leader strategies. By combining interior sampling, robust hyperplane discovery, and a controlled binary-search framework, it achieves a sample complexity of ${\tilde O}\left(n^2\left(m^7L\log(1/\zeta)+ {\binom{m+n}{m}}\right)\right)$ with high probability, matching the known lower bounds in the regime where one side’s action set is fixed. The approach overcomes critical limitations of prior methods (Letchford2010, Peng2019) by removing stringent assumptions, avoiding degeneracies in BR region identification, and explicitly balancing termination probability with sample cost. This work advances practical learning in commitment-based models and lays groundwork for applying similar techniques to other commitment-driven frameworks.
Abstract
Stackelberg games (SGs) constitute the most fundamental and acclaimed models of strategic interactions involving some form of commitment. Moreover, they form the basis of more elaborate models of this kind, such as, e.g., Bayesian persuasion and principal-agent problems. Addressing learning tasks in SGs and related models is crucial to operationalize them in practice, where model parameters are usually unknown. In this paper, we revise the sample complexity of learning an optimal strategy to commit to in SGs. We provide a novel algorithm that (i) does not require any of the limiting assumptions made by state-of-the-art approaches and (ii) deals with a trade-off between sample complexity and termination probability arising when leader's strategies representation has finite precision. Such a trade-off has been completely neglected by existing algorithms and, if not properly managed, it may result in them using exponentially-many samples. Our algorithm requires novel techniques, which also pave the way to addressing learning problems in other models with commitment ubiquitous in the real world.
