Neuro-Symbolic Synergy for Interactive World Modeling
Hongyu Zhao, Siyu Zhou, Haolin Yang, Zengyi Qin, Tianyi Zhou
TL;DR
Neural language models excel at general reasoning but struggle to enforce deterministic world dynamics, while symbolic models guarantee consistency but lack expressivity. The authors introduce Neuro-Symbolic Synergy (NeSyS), which directly modulates the Neural WM's output distribution with executable symbolic rules treated as energy terms, ensuring hard constraints without prompt-based instruction reliance. The training is conducted in two phases with reciprocal refinement and rule-guided data selection, achieving data efficiency and consistent improvements across ScienceWorld, Webshop, and Plancraft. Empirical results show NeSyS outperforms baselines across multiple backbones and tasks, and the approach demonstrates robustness to data scarcity and model scale, with potential for stronger routing strategies in future work.
Abstract
Large language models (LLMs) exhibit strong general-purpose reasoning capabilities, yet they frequently hallucinate when used as world models (WMs), where strict compliance with deterministic transition rules--particularly in corner cases--is essential. In contrast, Symbolic WMs provide logical consistency but lack semantic expressivity. To bridge this gap, we propose Neuro-Symbolic Synergy (NeSyS), a framework that integrates the probabilistic semantic priors of LLMs with executable symbolic rules to achieve both expressivity and robustness. NeSyS alternates training between the two models using trajectories inadequately explained by the other. Unlike rule-based prompting, the symbolic WM directly constrains the LLM by modifying its output probability distribution. The neural WM is fine-tuned only on trajectories not covered by symbolic rules, reducing training data by 50% without loss of accuracy. Extensive experiments on three distinct interactive environments, i.e., ScienceWorld, Webshop, and Plancraft, demonstrate NeSyS's consistent advantages over baselines in both WM prediction accuracy and data efficiency.
