Explore Reinforced: Equilibrium Approximation with Reinforcement Learning

Ryan Yu; Mateusz Nowak; Qintong Xie; Michelle Yilin Feng; Peter Chin

Explore Reinforced: Equilibrium Approximation with Reinforcement Learning

Ryan Yu, Mateusz Nowak, Qintong Xie, Michelle Yilin Feng, Peter Chin

TL;DR

The paper addresses the challenge of approximating equilibria like $CCE$ in large, multi-step stochastic environments where traditional methods struggle and standard RL lacks equilibrium guarantees. It proposes Exp3-IXrl, a hybrid algorithm that keeps the RL agent's action selection separate from the $CCE$ computation, using Exp3-IX as a third-party observer with a certainty threshold to trigger equilibrium-based decisions. Empirical results in CybORG CC2 and in stochastic and deterministic MAB tasks show faster convergence and strong performance relative to baselines, achieving PPO-level results in CC2 with moderate training. This approach broadens the applicability of equilibrium-approximation techniques to complex adversarial settings and points to adaptive certainty strategies as a promising direction for future work.

Abstract

Current approximate Coarse Correlated Equilibria (CCE) algorithms struggle with equilibrium approximation for games in large stochastic environments but are theoretically guaranteed to converge to a strong solution concept. In contrast, modern Reinforcement Learning (RL) algorithms provide faster training yet yield weaker solutions. We introduce Exp3-IXrl - a blend of RL and game-theoretic approach, separating the RL agent's action selection from the equilibrium computation while preserving the integrity of the learning process. We demonstrate that our algorithm expands the application of equilibrium approximation algorithms to new environments. Specifically, we show the improved performance in a complex and adversarial cybersecurity network environment - the Cyber Operations Research Gym - and in the classical multi-armed bandit settings.

Explore Reinforced: Equilibrium Approximation with Reinforcement Learning

TL;DR

The paper addresses the challenge of approximating equilibria like

in large, multi-step stochastic environments where traditional methods struggle and standard RL lacks equilibrium guarantees. It proposes Exp3-IXrl, a hybrid algorithm that keeps the RL agent's action selection separate from the

computation, using Exp3-IX as a third-party observer with a certainty threshold to trigger equilibrium-based decisions. Empirical results in CybORG CC2 and in stochastic and deterministic MAB tasks show faster convergence and strong performance relative to baselines, achieving PPO-level results in CC2 with moderate training. This approach broadens the applicability of equilibrium-approximation techniques to complex adversarial settings and points to adaptive certainty strategies as a promising direction for future work.

Explore Reinforced: Equilibrium Approximation with Reinforcement Learning

TL;DR

Abstract

Explore Reinforced: Equilibrium Approximation with Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)