Table of Contents
Fetching ...

Information Revelation and Alignment Faking in Stochastic Differential Games

Daniel Ralston, Xu Yang, Ruimeng Hu

Abstract

In competitive games with private objectives, actions can reveal information about hidden parameters. Quantifying such information revelation, however, is substantially more challenging, since it depends not only on the opponent's hidden parameter but also on the opponent's model of the game. We study this problem via a two-player linear-quadratic stochastic differential game under partial information, in which each player knows its own coupling parameter and models the opponent's hidden parameter through a prior. Starting from the full-information game, we characterize the Nash equilibrium by coupled Riccati equations. We then define baseline implementable controls by averaging the equilibrium under each player's prior. Building on this baseline, we formulate an alignment-faking control problem in which one player trades off fidelity to its implementable policy against information acquisition about the opponent's hidden parameter. The information incentive is constructed from a proxy Fisher information matrix based only on the player's available model. This leads to a tractable saddle-point formulation with semi-explicit control characterization through Riccati systems. Numerical illustrations show that alignment faking can substantially improve information gain over baseline play when the faker's model is accurate, but often at the cost of greater detectability. They also show that the proxy Fisher information can systematically differ from the true information under model misspecification.

Information Revelation and Alignment Faking in Stochastic Differential Games

Abstract

In competitive games with private objectives, actions can reveal information about hidden parameters. Quantifying such information revelation, however, is substantially more challenging, since it depends not only on the opponent's hidden parameter but also on the opponent's model of the game. We study this problem via a two-player linear-quadratic stochastic differential game under partial information, in which each player knows its own coupling parameter and models the opponent's hidden parameter through a prior. Starting from the full-information game, we characterize the Nash equilibrium by coupled Riccati equations. We then define baseline implementable controls by averaging the equilibrium under each player's prior. Building on this baseline, we formulate an alignment-faking control problem in which one player trades off fidelity to its implementable policy against information acquisition about the opponent's hidden parameter. The information incentive is constructed from a proxy Fisher information matrix based only on the player's available model. This leads to a tractable saddle-point formulation with semi-explicit control characterization through Riccati systems. Numerical illustrations show that alignment faking can substantially improve information gain over baseline play when the faker's model is accurate, but often at the cost of greater detectability. They also show that the proxy Fisher information can systematically differ from the true information under model misspecification.
Paper Structure (15 sections, 3 theorems, 43 equations, 4 figures, 1 algorithm)

This paper contains 15 sections, 3 theorems, 43 equations, 4 figures, 1 algorithm.

Key Result

Proposition 1

Let $\theta = \mathrm{diag}\{\theta^A, \theta^B\}$, and suppose where $c_1 = 5 (r_A^{-2} + r_B^{-2})^{1/2}$ and $c_2= \sqrt{q_A^2 + q_B^2}\,(1 + m_A^2 + m_B^2).$ Then any solution to eq:Riccati_A--eq:Riccati_B on $[0,T]$ satisfies $\|\theta(t; m_A, m_B)\|_F \le 1 + m_A^2 + m_B^2$.

Figures (4)

  • Figure 1: Schematic of the alignment faking game. Each player knows its own coupling parameter and holds a prior on the opponent's. Player $A$ may deviate from the baseline implementable control via AF, while player $B$ plays the baseline.
  • Figure 2: Combined view of AF behavior with fixed $\mu_A = m_A= 1.0$ and varying $\mu_B \in \{1.0, 1.2, 1.5, 1.8, 2.0, 2.2\}$. Top panels show state (left) and control (right) trajectories in the case that $\mu_B = m_B = 1.0$. Bottom panels show asymptotic variance (left) and regression-based detectability $D^{AF}$(right), both computed via Monte Carlo. Parameters: $q^{AF} = 5.0$, $\lambda^{AF} = 2.5$, and $\rho_A = \rho_B = 0.1$.
  • Figure 3: True asymptotic variance $[I(\gamma)^{-1}]_{m_B,m_B}$ for $\mu_A \in \{1.0, 1.25, 1.5, 1.75, 2.0\}$ and $\mu_B \in \{1.0, 1.25, 1.5, 1.75, 2.0, 2.25\}$ under both AF (solid) and no AF (dashed) gameplay. Parameters: $q^{AF} = 5.0$, $\lambda^{AF} = 2.5$, and $\rho_A = \rho_B = 0.1$.
  • Figure 4: True asymptotic variance $[I(\gamma)^{-1}]_{m_B,m_B}$ (top) and proxy asymptotic variance $[\overline{I}(\gamma)^{-1}]_{m_B,m_B}$ (bottom) versus $\lambda^{AF}\in \{0,1,2,3,4,5\}$, for standard deviations $\rho_A=\rho_B\in \{10^{-7},0.1,0.2,\ldots,1.0\}$ and under both AF (solid) and no AF (dashed) gameplay. Each point is computed by Monte Carlo. Parameters: $q^{AF}=10.0$, $\mu_A=\mu_B=1.0$.

Theorems & Definitions (7)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Remark 1: FI and Information Revelation
  • Proposition 3: Existence and Uniqueness of $\theta^{AF}$
  • proof