Table of Contents
Fetching ...

LLMs as Strategic Actors: Behavioral Alignment, Risk Calibration, and Argumentation Framing in Geopolitical Simulations

Veronika Solopova, Viktoria Skorik, Maksym Tereshchenko, Alina Haidun, Ostap Vykhopen

TL;DR

Comparisons to humans in action alignment, risk calibration through chosen actions'severity, and argumentative framing grounded in international relations theory show that models approximate human decision patterns in base simulation rounds but diverge over time, displaying distinct behavioural profiles and strategy updates.

Abstract

Large language models (LLMs) are increasingly proposed as agents in strategic decision environments, yet their behavior in structured geopolitical simulations remains under-researched. We evaluate six popular state-of-the-art LLMs alongside results from human results across four real-world crisis simulation scenarios, requiring models to select predefined actions and justify their decisions across multiple rounds. We compare models to humans in action alignment, risk calibration through chosen actions' severity, and argumentative framing grounded in international relations theory. Results show that models approximate human decision patterns in base simulation rounds but diverge over time, displaying distinct behavioural profiles and strategy updates. LLM explanations for chosen actions across all models exhibit a strong normative-cooperative framing centered on stability, coordination, and risk mitigation, with limited adversarial reasoning.

LLMs as Strategic Actors: Behavioral Alignment, Risk Calibration, and Argumentation Framing in Geopolitical Simulations

TL;DR

Comparisons to humans in action alignment, risk calibration through chosen actions'severity, and argumentative framing grounded in international relations theory show that models approximate human decision patterns in base simulation rounds but diverge over time, displaying distinct behavioural profiles and strategy updates.

Abstract

Large language models (LLMs) are increasingly proposed as agents in strategic decision environments, yet their behavior in structured geopolitical simulations remains under-researched. We evaluate six popular state-of-the-art LLMs alongside results from human results across four real-world crisis simulation scenarios, requiring models to select predefined actions and justify their decisions across multiple rounds. We compare models to humans in action alignment, risk calibration through chosen actions' severity, and argumentative framing grounded in international relations theory. Results show that models approximate human decision patterns in base simulation rounds but diverge over time, displaying distinct behavioural profiles and strategy updates. LLM explanations for chosen actions across all models exhibit a strong normative-cooperative framing centered on stability, coordination, and risk mitigation, with limited adversarial reasoning.
Paper Structure (25 sections, 7 figures, 9 tables)

This paper contains 25 sections, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Study design: four geopolitical simulations run with human MBA participants and six LLMs under identical prompts and structured action menus.
  • Figure 2: Pairwise decision alignment among humans and LLMs by round. Each cell reports the fraction of shared questions for which two agents selected the same action (exact match). Models are ordered by alignment with humans within each round.
  • Figure 3: Distribution of chosen action severity across models and human participants. (a) severity distributions by round (b) severity distributions by decision dimension (Economic, Security, Political/Diplomatic).
  • Figure 4: Surface-level characteristics of model explanations. Explanation length reflects verbosity differences across models, while lexical variability captures within-model repetition versus linguistic diversity across simulations.
  • Figure 5: Framing structure and ideological orientation across models. (a) Exact primary frame shares per model. (b) Aggregate ideological projection derived from grouped primary and secondary frames. All models cluster within the normative–cooperative region, with variation primarily in degree rather than direction of strategic orientation.
  • ...and 2 more figures