AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises
Kenneth Payne
TL;DR
This study investigates how frontier large language models reason in simulated nuclear crises, revealing sophisticated dynamics of credibility, commitment, misperception, and deception. It introduces a three-phase cognitive architecture (Reflection → Forecast → Signal/Action) in a simultaneous-move setting, enabling explicit analysis of signal–action gaps and metacognition across seven crisis scenarios. Results show three distinct model personalities and context-dependent performance, with RLHF-influenced restraint that can be overcome by deadline pressure, challenging conventional theories about deterrence, escalation, and taboo norms. The work demonstrates that AI-driven crisis simulations can illuminate strategic reasoning and safety considerations, offering a calibrated tool for theory refinement and policy planning while underscoring the need to evaluate AI systems across framing and time horizons.
Abstract
Today's leading AI models engage in sophisticated behaviour when placed in strategic competition. They spontaneously attempt deception, signaling intentions they do not intend to follow; they demonstrate rich theory of mind, reasoning about adversary beliefs and anticipating their actions; and they exhibit credible metacognitive self-awareness, assessing their own strategic abilities before deciding how to act. Here we present findings from a crisis simulation in which three frontier large language models (GPT-5.2, Claude Sonnet 4, Gemini 3 Flash) play opposing leaders in a nuclear crisis. Our simulation has direct application for national security professionals, but also, via its insights into AI reasoning under uncertainty, has applications far beyond international crisis decision-making. Our findings both validate and challenge central tenets of strategic theory. We find support for Schelling's ideas about commitment, Kahn's escalation framework, and Jervis's work on misperception, inter alia. Yet we also find that the nuclear taboo is no impediment to nuclear escalation by our models; that strategic nuclear attack, while rare, does occur; that threats more often provoke counter-escalation than compliance; that high mutual credibility accelerated rather than deterred conflict; and that no model ever chose accommodation or withdrawal even when under acute pressure, only reduced levels of violence. We argue that AI simulation represents a powerful tool for strategic analysis, but only if properly calibrated against known patterns of human reasoning. Understanding how frontier models do and do not imitate human strategic logic is essential preparation for a world in which AI increasingly shapes strategic outcomes.
