Table of Contents
Fetching ...

Strategic Inference in Stackelberg Games: Optimal Control for Revealing Adversary Intent

Ruimeng Hu, Daniel Ralston, Xu Yang, Haosheng Zhou

TL;DR

This work develops a continuous-time Stackelberg framework where a leader seeks to complete a primary objective while inferring a latent follower parameter $M\in\mathbb{R}$ from the follower's entropy-regularized, randomized tracking policy. By deriving a semi-explicit follower solution and embedding the inference objective into the leader's control problem, the authors formulate MLE and information-based criteria (variance and Fisher information) to guide strategy design; they provide augmented-state reformulations to produce tractable, path-dependent controls and prove well-posedness of the resulting ODE systems. A learning-based numerical approach using recurrent neural networks and direct parameterization is developed to approximate the leader's path-dependent policy, with extensive simulations showcasing the trade-off between task performance and information gain. The framework is extended to multi-period interactions and analyzed under discrete observations, highlighting practical implications for adversarial strategic inference and potential extensions to higher dimensions and partial observability.

Abstract

We study a continuous-time stochastic Stackelberg game in which a leader seeks to accomplish a primary objective while inferring a hidden parameter of a rational follower. The follower solves an entropy-regularized tracking problem and responds to the leader's trajectory with a randomized policy. Anticipating this response, the leader designs informative controls to maximize the estimation efficiency for the follower's latent intent, through maximum likelihood estimation. Unlike prior work on discrete-time or finite-candidate inverse learning, our framework enables continuous parameter inference without prior assumptions and endogenizes the information source through the follower's strategic feedback. We derive semi-explicit solutions, prove well-posedness, and develop recurrent neural network algorithms to approximate the leader's path-dependent control. Numerical experiments demonstrate how the leader balances task performance and information gain, highlighting the practical value of our approach for adversarial strategic inference.

Strategic Inference in Stackelberg Games: Optimal Control for Revealing Adversary Intent

TL;DR

This work develops a continuous-time Stackelberg framework where a leader seeks to complete a primary objective while inferring a latent follower parameter from the follower's entropy-regularized, randomized tracking policy. By deriving a semi-explicit follower solution and embedding the inference objective into the leader's control problem, the authors formulate MLE and information-based criteria (variance and Fisher information) to guide strategy design; they provide augmented-state reformulations to produce tractable, path-dependent controls and prove well-posedness of the resulting ODE systems. A learning-based numerical approach using recurrent neural networks and direct parameterization is developed to approximate the leader's path-dependent policy, with extensive simulations showcasing the trade-off between task performance and information gain. The framework is extended to multi-period interactions and analyzed under discrete observations, highlighting practical implications for adversarial strategic inference and potential extensions to higher dimensions and partial observability.

Abstract

We study a continuous-time stochastic Stackelberg game in which a leader seeks to accomplish a primary objective while inferring a hidden parameter of a rational follower. The follower solves an entropy-regularized tracking problem and responds to the leader's trajectory with a randomized policy. Anticipating this response, the leader designs informative controls to maximize the estimation efficiency for the follower's latent intent, through maximum likelihood estimation. Unlike prior work on discrete-time or finite-candidate inverse learning, our framework enables continuous parameter inference without prior assumptions and endogenizes the information source through the follower's strategic feedback. We derive semi-explicit solutions, prove well-posedness, and develop recurrent neural network algorithms to approximate the leader's path-dependent control. Numerical experiments demonstrate how the leader balances task performance and information gain, highlighting the practical value of our approach for adversarial strategic inference.

Paper Structure

This paper contains 14 sections, 5 theorems, 51 equations, 5 figures.

Key Result

Proposition 2.3

Given the leader's state trajectory $x^L$, the follower's optimal Markovian randomized policy $\pi^{F,*}$ for problem eq:follower_explore_dynamics--eq:follower_objective_func is given by where $a_t$ and $b_t$ satisfy the ordinary differential equations (ODEs): with terminal conditions $a_T = 0,\ b_T = 0$.

Figures (5)

  • Figure 1: Leader's optimal state trajectories (left) and control trajectories (right) for the FI maximization formulation \ref{['eq:leader_fisher_objective_func']} with $\lambda_L = 0.5$. Solid lines represent baseline trajectories from Proposition \ref{['prop:leader_optimal_control']}, while dashed lines denote LSTM-based approximations. Different colors correspond to different sample paths of $W^L$.
  • Figure 2: Leader/follower's optimal state trajectories (left) and leader's optimal control trajectories (right) under the FI maximization formulation \ref{['eq:leader_fisher_objective_func']} under different values of $Q_L/\lambda_L$. Solid lines denote the leader's trajectories, and dotted lines indicate the follower's trajectories. All leader trajectories are generated under the baseline control $u^{L,*}_{\mathrm{sub}}$ from Proposition \ref{['prop:leader_optimal_control']}, sharing the same sample path of $W^L$. The follower trajectories are plotted only for $Q_L/\lambda_L = 0.5$ and $100.0$ due to minimal variation across cases. A finer time discretization $N_T=500$ is adopted for better visualizations of the tracking behavior.
  • Figure 3: Comparisons of FI (top left), control effort (top right), and optimal trajectories (bottom) under the variance minimization \ref{['eq:leader_var_objective_func']} and FI maximization \ref{['eq:leader_fisher_objective_func']} formulations. Lines/bars in the same color correspond to intensity tuples $(\lambda_L^\mathrm{Var}, \lambda_L^{I})$ in \ref{['eqn:tuple_intensity']}, for which $I(M)$ is approximately aligned. Each value in the top panels is estimated from $10000$ sample paths. In the bottom panels, solid and dashed lines represent optimal trajectories under variance minimization \ref{['eq:leader_var_objective_func']} and FI maximization \ref{['eq:leader_fisher_objective_func']}, respectively, sharing the same sample path of $W^L$.
  • Figure 4: Conditional bias (left) and conditional variance (right) of the MLE $\widehat{M}$ under different values of $\lambda_L$. The leader's trajectories $x^{L,*}$ are generated using the FI maximization formulation \ref{['eq:leader_fisher_objective_func']} with the control $u^{L,*}_{\mathrm{sub}}$ from Proposition \ref{['prop:leader_optimal_control']}, sharing the same sample path of $W^L$. A three-point moving average is applied to the bias plot for readability.
  • Figure 5: Inference error (left) and conditional variance (right) of the multi-period estimator $\overline{M}_N$\ref{['eqn:period_MLE']} under the extended formulations \ref{['eq:leader_var_objective_func']}--\ref{['eq:leader_fisher_objective_func']}. Solid (resp. dashed) lines represent trajectories generated by the optimal control from the variance minimization \ref{['eq:leader_var_objective_func']} ((resp. FI maximization \ref{['eq:leader_fisher_objective_func']}) formulation. Lines with the same color are associated with tuples $(\lambda_L^\mathrm{Var}, \lambda_L^{I})$ in \ref{['eqn:tuple_intensity']}, for which $I(M)$ is aligned across formulation. The leader's (resp. follower's) trajectories share the same sample path of $W^L$ (resp. $W^F$). Both quantities are plotted on a log scale for clarity.

Theorems & Definitions (13)

  • Remark 2.1: Model interpretation
  • Remark 2.2: Measurability issue
  • Proposition 2.3
  • Theorem 2.4
  • Proposition 3.1
  • Remark 3.2: Observability of $\sigma_F$
  • Proposition 3.3
  • Theorem 3.4
  • Proof 1: Proof of Proposition \ref{['prop:follower_optimal_control']}
  • Proof 2: Proof of Theorem \ref{['thm:follower_ODE']}
  • ...and 3 more