What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking
Yuan Sui, Yanming Zhang, Yi Liao, Yu Gu, Guohua Tang, Zhongqian Sun, Wei Yang, Bryan Hooi
TL;DR
WiA-LLM introduces What-If Analysis for large language models by building an explicit, language-based world model to forecast the consequences of actions in a MOBA environment. The approach combines supervised fine-tuning on reasoning traces with reinforcement learning using rule-based, verifiable rewards to align forecasts with real dynamics. A lookahead-based inference mechanism enables model-based planning, improving strategic behavior and forecasting accuracy in Honor of Kings. Experiments demonstrate strong performance across varying difficulty levels and show that proactive forecasting enhances forward-looking decision-making while preserving core language capabilities. The work advances interpretable, generalizable planning for LLMs in dynamic, partially observable environments, with practical deployment considerations discussed.
Abstract
Large Language Models (LLMs) are effective at reasoning and information retrieval, but remain unreliable for decision-making in dynamic, partially observable, high-stakes environments such as MOBA games. One key limitation is weak counterfactual reasoning: LLMs struggle to conduct precise what-if analysis over candidate actions and their future consequences. We address this limitation with What-if Analysis LLM (WiA-LLM), a framework that trains an LLM as an explicit language-based world model. Instead of representing the environment in latent vectors, WiA-LLM models how the game state evolves over time with candidate actions using language, and provides textual justifications for these predicted outcomes. This explicit modeling supports (1) interpretability, since the model's predictions and underlying rationales are human-readable, and (2) semantic generalization, as the model can transfer knowledge across situations that share similar game concepts (e.g., roles, objectives, or tactics). WiA-LLM is trained in two stages: supervised fine-tuning on human-like reasoning traces, followed by reinforcement learning with outcome-based rewards that depend on the discrepancy between predicted and ground-truth future states. In the Honor of Kings (HoK) environment, WiA-LLM attains 74.2\% accuracy (27\%$\uparrow$ vs. base model) in forecasting game-state changes. In addition, we find that agents with WiA-LLM exhibit closer strategic behavior to expert players than purely reactive LLM agents, indicating more foresight-aware and expert-aligned decision-making.
