Is Learning Effective in Dynamic Strategic Interactions? Evidence from Stackelberg Games
Michael Albert, Quinlan Dawkins, Minbiao Han, Haifeng Xu
TL;DR
This work challenges the No Learning Theorem by showing that learning can be effective in dynamic Bayesian Stackelberg games with fully strategic followers. It develops a sufficient sub-group BR condition and analyzes random games to establish average-case effectiveness, proving that learning can outperform static policies even without explicit communication. The authors introduce a mixed-integer linear program to compute the exact dynamic equilibrium (DSE) and propose two scalable heuristics (Markovian and First-$k$) that perform well in practice, validated through simulations on structured and random games. The results suggest that, outside dynamic pricing, learning and commitment together enable improved leader payoff without weakening follower rationality, with broad implications for contract design, security, and strategic pricing.
Abstract
In many settings of interest, a policy is set by one party, the leader, in order to influence the action of another party, the follower, where the follower's response is determined by some private information. A natural question to ask is, can the leader improve their strategy by learning about the unknown follower through repeated interactions? A well known folk theorem from dynamic pricing, a special case of this leader-follower setting, would suggest that the leader cannot learn effectively from the follower when the follower is fully strategic, leading to a large literature on learning in strategic settings that relies on limiting the strategic space of the follower in order to provide positive results. In this paper, we study dynamic Bayesian Stackelberg games, where a leader and a \emph{fully strategic} follower interact repeatedly, with the follower's type unknown. Contrary to existing results, we show that the leader can improve their utility through learning in repeated play. Using a novel average-case analysis, we demonstrate that learning is effective in these settings, without needing to weaken the follower's strategic space. Importantly, this improvement is not solely due to the leader's ability to commit, nor does learning simply substitute for communication between the parties. We provide an algorithm, based on a mixed-integer linear program, to compute the optimal leader policy in these games and develop heuristic algorithms to approximate the optimal dynamic policy more efficiently. Through simulations, we compare the efficiency and runtime of these algorithms against static policies.
