When the Correct Model Fails: The Optimality of Stackelberg Equilibria with Follower Intention Updates
Cayetana Salinas-Rodriguez, Jonathan Rogers, Sarah H. Q. Li
TL;DR
This work examines Stackelberg dynamic games where the leader does not know the follower's best response and can update its belief during the horizon. It develops both open-loop and feedback formulations under LTI dynamics and analyzes how BR belief updates (two beliefs $b^1$ and $b^2$ at update time $\tau$) affect equilibrium optimality, showing that the true BR does not always minimize the total cost due to time-inconsistency in OL and potential non-Markov perfection in FB. The contributions include a sufficient condition for OLSE optimality with BR updates, a discussion of MPFSE for FB updates, and numerical LQ simulations with Bayesian BR estimation that reveal nontrivial advantages to incorrect BR beliefs in certain regimes, including collision-avoidance scenarios. The results have practical implications for designing adaptive, interactive autonomous systems where intention estimation and belief updates must be balanced against potential cost trade-offs and time-consistency considerations.
Abstract
We study a two-player dynamic Stackelberg game between a leader and a follower whose intention is unknown to the leader. Classical formulations of the Stackelberg equilibrium (SE) assume that the follower's best response (BR) function is known to the leader. However, this is not always true in practice. We study a setting in which the leader receives updated beliefs about the follower BR before the end of the game, such that the update prompts the leader and subsequently the follower to re-optimize their strategies. We characterize the optimality guarantees of the SE solutions under this belief update for both open loop and feedback information structures. Interestingly, we prove that in general, assuming an incorrect follower's BR can lead to more optimal leader costs over the entire game than knowing the true follower's BR. We support these results with numerical examples in a linear quadratic (LQ) Stackelberg game, and use Monte Carlo simulations to show that the instances of incorrect BR achieving lower leader costs are non-trivial in collision avoidance LQ Stackelberg games.
