Conjectural Online Learning with First-order Beliefs in Asymmetric Information Stochastic Games

Tao Li; Kim Hammar; Rolf Stadler; Quanyan Zhu

Conjectural Online Learning with First-order Beliefs in Asymmetric Information Stochastic Games

Tao Li, Kim Hammar, Rolf Stadler, Quanyan Zhu

TL;DR

This work addresses learning in asymmetric-information stochastic games (AISGs) by replacing intractable belief hierarchies with first-order beliefs within a conjectural online learning (COL) framework. COL employs a forecaster-actor-critic (FAC) architecture: a Bayesian forecaster generates conjectures about an opponent's strategy $\widehat{\pi}_{-k,t}$ over a lookahead horizon $\ell_k$, a critic evaluates the value $\widehat{J}^{(\pi_k,\widehat{\pi}_{-k})}$, and an actor updates the policy via an $\ell_k$-step rollout; conjectures are continuously refined using information feedback $\mathbf{i}^k_t$. A KL-based consistency metric $K(\widehat{\ell}_{-k}, \bm{\nu})$ governs conjecture consistency, and the posterior $\mu_t^k$ concentrates on the set of consistent conjectures $\Theta_k^*(\bm{\nu})$, yielding empirical convergence to a Berk-Nash equilibrium in repeated AISGs. Theoretical results show asymptotic consistency of conjectures and convergence of the induced strategy profile to Berk-Nash, while the intrusion-response case study demonstrates faster and more stable adaptation to nonstationary attacks compared with reinforcement learning baselines. Overall, COL provides a practical online learning approach for resilient decision-making in socio-technical systems with asymmetric information, with potential applications in cyber-defense and IT infrastructure management.

Abstract

Asymmetric information stochastic games (AISGs) arise in many complex socio-technical systems, such as cyber-physical systems and IT infrastructures. Existing computational methods for AISGs are primarily offline and can not adapt to equilibrium deviations. Further, current methods are limited to particular information structures to avoid belief hierarchies. Considering these limitations, we propose conjectural online learning (COL), an online learning method under generic information structures in AISGs. COL uses a forecaster-actor-critic (FAC) architecture, where subjective forecasts are used to conjecture the opponents' strategies within a lookahead horizon, and Bayesian learning is used to calibrate the conjectures. To adapt strategies to nonstationary environments based on information feedback, COL uses online rollout with cost function approximation (actor-critic). We prove that the conjectures produced by COL are asymptotically consistent with the information feedback in the sense of a relaxed Bayesian consistency. We also prove that the empirical strategy profile induced by COL converges to the Berk-Nash equilibrium, a solution concept characterizing rationality under subjectivity. Experimental results from an intrusion response use case demonstrate COL's {faster convergence} over state-of-the-art reinforcement learning methods against nonstationary attacks.

Conjectural Online Learning with First-order Beliefs in Asymmetric Information Stochastic Games

TL;DR

over a lookahead horizon

, a critic evaluates the value

, and an actor updates the policy via an

-step rollout; conjectures are continuously refined using information feedback

. A KL-based consistency metric

governs conjecture consistency, and the posterior

concentrates on the set of consistent conjectures

, yielding empirical convergence to a Berk-Nash equilibrium in repeated AISGs. Theoretical results show asymptotic consistency of conjectures and convergence of the induced strategy profile to Berk-Nash, while the intrusion-response case study demonstrates faster and more stable adaptation to nonstationary attacks compared with reinforcement learning baselines. Overall, COL provides a practical online learning approach for resilient decision-making in socio-technical systems with asymmetric information, with potential applications in cyber-defense and IT infrastructure management.

Abstract

Paper Structure (12 sections, 3 theorems, 29 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 12 sections, 3 theorems, 29 equations, 2 figures, 2 tables, 1 algorithm.

Introduction
Asymmetric Information Stochastic Game
Asymmetric Information and Belief Hierarchy
Subjective Rationality
Conjectural Online Learning
Bayesian Forecaster and Consistent Conjecture
Equilibrium Analysis in Repeated aisgs
Case Study: Intrusion Response
Conclusion
Full Proof of \ref{['thm:asym-consistency']}
Full Proof of \ref{['coro:convergence-berk-nash']}
Experiment Setup

Key Result

Theorem 1

For any sequence $(\bm{\pi}_{\mathbf{h}_t}, \bm{\nu}_{\mathbf{h}_t})_{t\geq 1}$ from Alg. alg:online_rollout, a.s.-$\mathbb{P}^{\mathscr{B},\mathscr{R}}$, where $\mathbb{P}^{\mathscr{B},\mathscr{R}}$ denotes the probability measure over the set of realizable histories $\mathbf{h}_t$ induced by $(\bm{\pi}_{\mathbf{h}_t})_{t\geq 1}$ under the rollout ($\mathscr{R}$) and Bayesian belief update ($\ma

Figures (2)

Figure 1: One-step cycle in col: conjectural online learning (see also Alg. \ref{['alg:online_rollout']}); the player $\mathrm{k}$ updates its conjecture $\widehat{\ell}_{-\mathrm{k},t}$ about the opponent's policy parameterization by sampling from the posterior $\mu_{t}^{\mathrm{k}}$, from which it forecasts opponent's future moves $\widehat{\pi}_{-\mathrm{k},t}$ conditional on its own first-order beliefs $\mathbf{b}_t^{\mathrm{k}}$; a rollout-based actor-critic creates policy improvement against the conjectured opponent.
Figure 2: Evaluation results for the intrusion response case study; values indicate the mean; the shaded areas and the error bars indicate the 95% confidence interval based on $20$ random seeds; hyperparameters are listed in online appendix.

Theorems & Definitions (9)

Theorem 1
proof
Definition 1: Berk-Nash Equilibrium, adapted from esponda16berk
Corollary 1
proof
Lemma 1
proof
proof
proof

Conjectural Online Learning with First-order Beliefs in Asymmetric Information Stochastic Games

TL;DR

Abstract

Conjectural Online Learning with First-order Beliefs in Asymmetric Information Stochastic Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (9)