Information Revelation and Alignment Faking in Stochastic Differential Games

Daniel Ralston; Xu Yang; Ruimeng Hu

Information Revelation and Alignment Faking in Stochastic Differential Games

Daniel Ralston, Xu Yang, Ruimeng Hu

Abstract

In competitive games with private objectives, actions can reveal information about hidden parameters. Quantifying such information revelation, however, is substantially more challenging, since it depends not only on the opponent's hidden parameter but also on the opponent's model of the game. We study this problem via a two-player linear-quadratic stochastic differential game under partial information, in which each player knows its own coupling parameter and models the opponent's hidden parameter through a prior. Starting from the full-information game, we characterize the Nash equilibrium by coupled Riccati equations. We then define baseline implementable controls by averaging the equilibrium under each player's prior. Building on this baseline, we formulate an alignment-faking control problem in which one player trades off fidelity to its implementable policy against information acquisition about the opponent's hidden parameter. The information incentive is constructed from a proxy Fisher information matrix based only on the player's available model. This leads to a tractable saddle-point formulation with semi-explicit control characterization through Riccati systems. Numerical illustrations show that alignment faking can substantially improve information gain over baseline play when the faker's model is accurate, but often at the cost of greater detectability. They also show that the proxy Fisher information can systematically differ from the true information under model misspecification.

Information Revelation and Alignment Faking in Stochastic Differential Games

Abstract

Paper Structure (15 sections, 3 theorems, 43 equations, 4 figures, 1 algorithm)

This paper contains 15 sections, 3 theorems, 43 equations, 4 figures, 1 algorithm.

Introduction
Problem Setup and Baseline Nash Equilibrium
State Dynamics and Cost Functionals
Baseline Nash Equilibrium
Game under Partial Information
Alignment Faking and Information Revelation
Likelihood Ratio and Fisher Information (FI)
Proxy Alignment Faking Objective
Solution via Quadratic Ansatz and Gradient Descent
Detection of Alignment Faking
Numerical Illustrations
Trajectories, Information, and Detectability
Effect of Mean Misspecification on Information
Role of $\lambda^{AF}$ under AF Control
Conclusions and Future Work

Key Result

Proposition 1

Let $\theta = \mathrm{diag}\{\theta^A, \theta^B\}$, and suppose where $c_1 = 5 (r_A^{-2} + r_B^{-2})^{1/2}$ and $c_2= \sqrt{q_A^2 + q_B^2}\,(1 + m_A^2 + m_B^2).$ Then any solution to eq:Riccati_A--eq:Riccati_B on $[0,T]$ satisfies $\|\theta(t; m_A, m_B)\|_F \le 1 + m_A^2 + m_B^2$.

Figures (4)

Figure 1: Schematic of the alignment faking game. Each player knows its own coupling parameter and holds a prior on the opponent's. Player $A$ may deviate from the baseline implementable control via AF, while player $B$ plays the baseline.
Figure 2: Combined view of AF behavior with fixed $\mu_A = m_A= 1.0$ and varying $\mu_B \in \{1.0, 1.2, 1.5, 1.8, 2.0, 2.2\}$. Top panels show state (left) and control (right) trajectories in the case that $\mu_B = m_B = 1.0$. Bottom panels show asymptotic variance (left) and regression-based detectability $D^{AF}$(right), both computed via Monte Carlo. Parameters: $q^{AF} = 5.0$, $\lambda^{AF} = 2.5$, and $\rho_A = \rho_B = 0.1$.
Figure 3: True asymptotic variance $[I(\gamma)^{-1}]_{m_B,m_B}$ for $\mu_A \in \{1.0, 1.25, 1.5, 1.75, 2.0\}$ and $\mu_B \in \{1.0, 1.25, 1.5, 1.75, 2.0, 2.25\}$ under both AF (solid) and no AF (dashed) gameplay. Parameters: $q^{AF} = 5.0$, $\lambda^{AF} = 2.5$, and $\rho_A = \rho_B = 0.1$.
Figure 4: True asymptotic variance $[I(\gamma)^{-1}]_{m_B,m_B}$ (top) and proxy asymptotic variance $[\overline{I}(\gamma)^{-1}]_{m_B,m_B}$ (bottom) versus $\lambda^{AF}\in \{0,1,2,3,4,5\}$, for standard deviations $\rho_A=\rho_B\in \{10^{-7},0.1,0.2,\ldots,1.0\}$ and under both AF (solid) and no AF (dashed) gameplay. Each point is computed by Monte Carlo. Parameters: $q^{AF}=10.0$, $\mu_A=\mu_B=1.0$.

Theorems & Definitions (7)

Proposition 1
proof
Proposition 2
proof
Remark 1: FI and Information Revelation
Proposition 3: Existence and Uniqueness of $\theta^{AF}$
proof

Information Revelation and Alignment Faking in Stochastic Differential Games

Abstract

Information Revelation and Alignment Faking in Stochastic Differential Games

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (7)