Table of Contents
Fetching ...

Neuro-Symbolic Synergy for Interactive World Modeling

Hongyu Zhao, Siyu Zhou, Haolin Yang, Zengyi Qin, Tianyi Zhou

TL;DR

Neural language models excel at general reasoning but struggle to enforce deterministic world dynamics, while symbolic models guarantee consistency but lack expressivity. The authors introduce Neuro-Symbolic Synergy (NeSyS), which directly modulates the Neural WM's output distribution with executable symbolic rules treated as energy terms, ensuring hard constraints without prompt-based instruction reliance. The training is conducted in two phases with reciprocal refinement and rule-guided data selection, achieving data efficiency and consistent improvements across ScienceWorld, Webshop, and Plancraft. Empirical results show NeSyS outperforms baselines across multiple backbones and tasks, and the approach demonstrates robustness to data scarcity and model scale, with potential for stronger routing strategies in future work.

Abstract

Large language models (LLMs) exhibit strong general-purpose reasoning capabilities, yet they frequently hallucinate when used as world models (WMs), where strict compliance with deterministic transition rules--particularly in corner cases--is essential. In contrast, Symbolic WMs provide logical consistency but lack semantic expressivity. To bridge this gap, we propose Neuro-Symbolic Synergy (NeSyS), a framework that integrates the probabilistic semantic priors of LLMs with executable symbolic rules to achieve both expressivity and robustness. NeSyS alternates training between the two models using trajectories inadequately explained by the other. Unlike rule-based prompting, the symbolic WM directly constrains the LLM by modifying its output probability distribution. The neural WM is fine-tuned only on trajectories not covered by symbolic rules, reducing training data by 50% without loss of accuracy. Extensive experiments on three distinct interactive environments, i.e., ScienceWorld, Webshop, and Plancraft, demonstrate NeSyS's consistent advantages over baselines in both WM prediction accuracy and data efficiency.

Neuro-Symbolic Synergy for Interactive World Modeling

TL;DR

Neural language models excel at general reasoning but struggle to enforce deterministic world dynamics, while symbolic models guarantee consistency but lack expressivity. The authors introduce Neuro-Symbolic Synergy (NeSyS), which directly modulates the Neural WM's output distribution with executable symbolic rules treated as energy terms, ensuring hard constraints without prompt-based instruction reliance. The training is conducted in two phases with reciprocal refinement and rule-guided data selection, achieving data efficiency and consistent improvements across ScienceWorld, Webshop, and Plancraft. Empirical results show NeSyS outperforms baselines across multiple backbones and tasks, and the approach demonstrates robustness to data scarcity and model scale, with potential for stronger routing strategies in future work.

Abstract

Large language models (LLMs) exhibit strong general-purpose reasoning capabilities, yet they frequently hallucinate when used as world models (WMs), where strict compliance with deterministic transition rules--particularly in corner cases--is essential. In contrast, Symbolic WMs provide logical consistency but lack semantic expressivity. To bridge this gap, we propose Neuro-Symbolic Synergy (NeSyS), a framework that integrates the probabilistic semantic priors of LLMs with executable symbolic rules to achieve both expressivity and robustness. NeSyS alternates training between the two models using trajectories inadequately explained by the other. Unlike rule-based prompting, the symbolic WM directly constrains the LLM by modifying its output probability distribution. The neural WM is fine-tuned only on trajectories not covered by symbolic rules, reducing training data by 50% without loss of accuracy. Extensive experiments on three distinct interactive environments, i.e., ScienceWorld, Webshop, and Plancraft, demonstrate NeSyS's consistent advantages over baselines in both WM prediction accuracy and data efficiency.
Paper Structure (40 sections, 2 equations, 4 figures, 4 tables)

This paper contains 40 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: An example of the world modeling task. Given the current belief state and the agent's next action, the world model needs to predict the next state. Both a neural world model (LLM) and a symbolic world model fail to answer the question by themselves. We propose to combine the two world models by viewing the symbolic scores as an energy term that modifies the probability distribution of the neural world model.
  • Figure 2: Overview of NeSyS. It consists of two world models: Neural WM and Symbolic WM. They are implemented as an LLM $\theta$ with likelihood function $P_\theta$ and a weighted rule set $\mathcal{F}$, respectively. Neural WM generates $K$ candidates of the next state and reward pairs if there is no provided choices. The likelihood $p_i$ for each candidate is computed. Symbolic WM aggregates the score $e_{ij}$ produced by each rule $f_j$. We then modify the likelihood $p_i$ with the score by Symbolic WM, and choose the candidate with the largest modified likelihood $\tilde{p}_i$. For simplicity, the conditional $b_t$ and $a_t$ are omitted from the parameters of $P_\theta$ and $f_j$.
  • Figure 3: Training pipeline of NeSyS. It consists of two phases. In Phase 1 (Initialization), we initialize Neural WM with a pretrained LLM. We evaluate it on the development set to separate common sense from task-specific knowledge, generating rules for the latter to initialize Symbolic WM. In Phase 2 (Reciprocal Refinement), we use Symbolic WM to perform rule-guided data selection on the training set by filtering out simple cases. The remaining "hard" data are used to fine-tune Neural WM. Symbolic WM is then refined by addressing long-tailed cases where the updated Neural WM still fails. Legend is in the lower left corner. Weight optimization for Symbolic WM is omitted for clarity.
  • Figure 4: Comparing rule-guided data selection (ours) vs. random data selection under different training budgets on Plancraft. The backbone model is Llama-3.2-1B-Instruct. The shadow in the figure highlights the performance gain of our strategy.