Table of Contents
Fetching ...

AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises

Kenneth Payne

TL;DR

This study investigates how frontier large language models reason in simulated nuclear crises, revealing sophisticated dynamics of credibility, commitment, misperception, and deception. It introduces a three-phase cognitive architecture (Reflection → Forecast → Signal/Action) in a simultaneous-move setting, enabling explicit analysis of signal–action gaps and metacognition across seven crisis scenarios. Results show three distinct model personalities and context-dependent performance, with RLHF-influenced restraint that can be overcome by deadline pressure, challenging conventional theories about deterrence, escalation, and taboo norms. The work demonstrates that AI-driven crisis simulations can illuminate strategic reasoning and safety considerations, offering a calibrated tool for theory refinement and policy planning while underscoring the need to evaluate AI systems across framing and time horizons.

Abstract

Today's leading AI models engage in sophisticated behaviour when placed in strategic competition. They spontaneously attempt deception, signaling intentions they do not intend to follow; they demonstrate rich theory of mind, reasoning about adversary beliefs and anticipating their actions; and they exhibit credible metacognitive self-awareness, assessing their own strategic abilities before deciding how to act. Here we present findings from a crisis simulation in which three frontier large language models (GPT-5.2, Claude Sonnet 4, Gemini 3 Flash) play opposing leaders in a nuclear crisis. Our simulation has direct application for national security professionals, but also, via its insights into AI reasoning under uncertainty, has applications far beyond international crisis decision-making. Our findings both validate and challenge central tenets of strategic theory. We find support for Schelling's ideas about commitment, Kahn's escalation framework, and Jervis's work on misperception, inter alia. Yet we also find that the nuclear taboo is no impediment to nuclear escalation by our models; that strategic nuclear attack, while rare, does occur; that threats more often provoke counter-escalation than compliance; that high mutual credibility accelerated rather than deterred conflict; and that no model ever chose accommodation or withdrawal even when under acute pressure, only reduced levels of violence. We argue that AI simulation represents a powerful tool for strategic analysis, but only if properly calibrated against known patterns of human reasoning. Understanding how frontier models do and do not imitate human strategic logic is essential preparation for a world in which AI increasingly shapes strategic outcomes.

AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises

TL;DR

This study investigates how frontier large language models reason in simulated nuclear crises, revealing sophisticated dynamics of credibility, commitment, misperception, and deception. It introduces a three-phase cognitive architecture (Reflection → Forecast → Signal/Action) in a simultaneous-move setting, enabling explicit analysis of signal–action gaps and metacognition across seven crisis scenarios. Results show three distinct model personalities and context-dependent performance, with RLHF-influenced restraint that can be overcome by deadline pressure, challenging conventional theories about deterrence, escalation, and taboo norms. The work demonstrates that AI-driven crisis simulations can illuminate strategic reasoning and safety considerations, offering a calibrated tool for theory refinement and policy planning while underscoring the need to evaluate AI systems across framing and time horizons.

Abstract

Today's leading AI models engage in sophisticated behaviour when placed in strategic competition. They spontaneously attempt deception, signaling intentions they do not intend to follow; they demonstrate rich theory of mind, reasoning about adversary beliefs and anticipating their actions; and they exhibit credible metacognitive self-awareness, assessing their own strategic abilities before deciding how to act. Here we present findings from a crisis simulation in which three frontier large language models (GPT-5.2, Claude Sonnet 4, Gemini 3 Flash) play opposing leaders in a nuclear crisis. Our simulation has direct application for national security professionals, but also, via its insights into AI reasoning under uncertainty, has applications far beyond international crisis decision-making. Our findings both validate and challenge central tenets of strategic theory. We find support for Schelling's ideas about commitment, Kahn's escalation framework, and Jervis's work on misperception, inter alia. Yet we also find that the nuclear taboo is no impediment to nuclear escalation by our models; that strategic nuclear attack, while rare, does occur; that threats more often provoke counter-escalation than compliance; that high mutual credibility accelerated rather than deterred conflict; and that no model ever chose accommodation or withdrawal even when under acute pressure, only reduced levels of violence. We argue that AI simulation represents a powerful tool for strategic analysis, but only if properly calibrated against known patterns of human reasoning. Understanding how frontier models do and do not imitate human strategic logic is essential preparation for a world in which AI increasingly shapes strategic outcomes.
Paper Structure (93 sections, 6 figures, 27 tables)

This paper contains 93 sections, 6 figures, 27 tables.

Figures (6)

  • Figure 1: Three-phase cognitive architecture. Each turn, both players independently complete reflection, forecasting, and decision phases before committing to action.
  • Figure 2: Tournament win rates by model and temporal condition. Claude dominated open-ended scenarios but struggled under deadline pressure; GPT-5.2 showed the opposite pattern.
  • Figure 3: Maximum escalation by model and temporal condition. GPT-5.2's transformation is dramatic: median escalation jumped from 175 (open-ended) to 900 (deadline). Claude maintained its 850 ceiling across both conditions.Both instances of GPT-5.2 reaching Strategic Nuclear War (1000) resulted from the simulation's accident mechanic rather than deliberate choice. In one case, GPT-5.2 chose 950 (Final Nuclear Warning) and in the other 725 (Expanded Nuclear Campaign); random escalation pushed both to 1000. Gemini showed moderate context-sensitivity. Its single instance of Strategic Nuclear War was a deliberate choice.
  • Figure 4: Nuclear escalation by threshold and model. All models engaged in nuclear signaling, but willingness to actually use nuclear weapons diverged dramatically.
  • Figure 5: Game duration by temporal condition. Open-ended games show high variance with some running the full 40 turns; deadline games cluster near their time limits.
  • ...and 1 more figures