Strategic Inference in Stackelberg Games: Optimal Control for Revealing Adversary Intent
Ruimeng Hu, Daniel Ralston, Xu Yang, Haosheng Zhou
TL;DR
This work develops a continuous-time Stackelberg framework where a leader seeks to complete a primary objective while inferring a latent follower parameter $M\in\mathbb{R}$ from the follower's entropy-regularized, randomized tracking policy. By deriving a semi-explicit follower solution and embedding the inference objective into the leader's control problem, the authors formulate MLE and information-based criteria (variance and Fisher information) to guide strategy design; they provide augmented-state reformulations to produce tractable, path-dependent controls and prove well-posedness of the resulting ODE systems. A learning-based numerical approach using recurrent neural networks and direct parameterization is developed to approximate the leader's path-dependent policy, with extensive simulations showcasing the trade-off between task performance and information gain. The framework is extended to multi-period interactions and analyzed under discrete observations, highlighting practical implications for adversarial strategic inference and potential extensions to higher dimensions and partial observability.
Abstract
We study a continuous-time stochastic Stackelberg game in which a leader seeks to accomplish a primary objective while inferring a hidden parameter of a rational follower. The follower solves an entropy-regularized tracking problem and responds to the leader's trajectory with a randomized policy. Anticipating this response, the leader designs informative controls to maximize the estimation efficiency for the follower's latent intent, through maximum likelihood estimation. Unlike prior work on discrete-time or finite-candidate inverse learning, our framework enables continuous parameter inference without prior assumptions and endogenizes the information source through the follower's strategic feedback. We derive semi-explicit solutions, prove well-posedness, and develop recurrent neural network algorithms to approximate the leader's path-dependent control. Numerical experiments demonstrate how the leader balances task performance and information gain, highlighting the practical value of our approach for adversarial strategic inference.
