Language-Guided World Models: A Model-Based Approach to AI Control

Alex Zhang; Khanh Nguyen; Jens Tuyls; Albert Lin; Karthik Narasimhan

Language-Guided World Models: A Model-Based Approach to AI Control

Alex Zhang, Khanh Nguyen, Jens Tuyls, Albert Lin, Karthik Narasimhan

TL;DR

This work tackles how to endow AI agents with controllable, language-grounded world models by formalizing environment dynamics as $M(s_{t+1}, r_{t+1}, d_{t+1} \\mid h_t, oldsymbol{v})$ and learning a language-conditioned model $M_{\theta}(s_{t+1}, r_{t+1}, d_{t+1} \\mid h_t, \boldsymbol{\ell}})$. It introduces LWMs and an EMMA-inspired multi-modal attention mechanism to ground language descriptions to entity attributes, and establishes Messenger-WM as a benchmark to probe compositional generalization. Empirically, Transformer baselines struggle on harder settings, while EMMA-LWM substantially improves grounding and trajectory simulation, approaching an oracle with semantic parsing; it also enables pre-execution plan discussions with humans, increasing safety and transparency. The work demonstrates that language-conditioned world models can enhance controllability and safety in AI systems, and suggests a research direction toward modular, language-parameterized architectures for scalable human-AI collaboration.

Abstract

This paper introduces the concept of Language-Guided World Models (LWMs) -- probabilistic models that can simulate environments by reading texts. Agents equipped with these models provide humans with more extensive and efficient control, allowing them to simultaneously alter agent behaviors in multiple tasks via natural verbal communication. In this work, we take initial steps in developing robust LWMs that can generalize to compositionally novel language descriptions. We design a challenging world modeling benchmark based on the game of MESSENGER (Hanjie et al., 2021), featuring evaluation settings that require varying degrees of compositional generalization. Our experiments reveal the lack of generalizability of the state-of-the-art Transformer model, as it offers marginal improvements in simulation quality over a no-text baseline. We devise a more robust model by fusing the Transformer with the EMMA attention mechanism (Hanjie et al., 2021). Our model substantially outperforms the Transformer and approaches the performance of a model with an oracle semantic parsing and grounding capability. To demonstrate the practicality of this model in improving AI safety and transparency, we simulate a scenario in which the model enables an agent to present plans to a human before execution, and to revise plans based on their language feedback.

Language-Guided World Models: A Model-Based Approach to AI Control

TL;DR

This work tackles how to endow AI agents with controllable, language-grounded world models by formalizing environment dynamics as

and learning a language-conditioned model

. It introduces LWMs and an EMMA-inspired multi-modal attention mechanism to ground language descriptions to entity attributes, and establishes Messenger-WM as a benchmark to probe compositional generalization. Empirically, Transformer baselines struggle on harder settings, while EMMA-LWM substantially improves grounding and trajectory simulation, approaching an oracle with semantic parsing; it also enables pre-execution plan discussions with humans, increasing safety and transparency. The work demonstrates that language-conditioned world models can enhance controllability and safety in AI systems, and suggests a research direction toward modular, language-parameterized architectures for scalable human-AI collaboration.

Abstract

Paper Structure (35 sections, 3 equations, 4 figures, 6 tables)

This paper contains 35 sections, 3 equations, 4 figures, 6 tables.

Introduction
Background: world models
Model-based agents can require less effort to adapt.
Observational world models.
Language-guided world models (LWMs)
Formulation
Modeling entity-based environments
Testing for compositional generalization.
The Messenger-WM benchmark
Environment dynamics.
Game manual.
Evaluation settings.
Modeling approach
State representation.
World modeling as sequence generation.
...and 20 more sections

Figures (4)

Figure 1: Language-guided world models (LWMs) offer human an efficient mechanism to regulate artificial agents. (a) We illustrate a potential application of LWMs to improving AI safety and transparency. These models enable an agent to generate visual plans and invite a human supervisor to validate them. Moreover, the human can adjust the plans by modifying the agent's world model with language feedback, in addition to directly correcting its policy. (b) We design an architecture for LWMs that exhibits strong compositional generalization. We replace the cross-attention mechanism of the standard Transformer with a new attention mechanism inspired by hanjie2021grounding to effectively incorporate language descriptions. We then train a model that auto-regressively generates tokenized observations conditioned on language descriptions and actions.
Figure 2: Messenger environment with manual.
Figure 3: A qualitative example taken from the NewAll split. The Observational model mistakenly captures the movement patterns of the immobile queen goal and the chasing whale message. It also misrecognizes the whale as an enemy, predicting a wrong reward $r$ and incorrectly predicting a termination state $d$ after the player collides with this entity. The GPTHard model incorrectly identifies the queen as the message and predicts the whale to be fleeing. Meanwhile, our model EMMA-LWM accurately captures all of those roles and movements.
Figure 4: The cross entropy losses of the models when conditioned on ground-truth trajectory prefixes up to a certain length. We plot the means with 95% t-value confidence intervals. The losses generally decrease as the prefix length increases. EMMA-LWM outperforms baselines given any prefix length.

Language-Guided World Models: A Model-Based Approach to AI Control

TL;DR

Abstract

Language-Guided World Models: A Model-Based Approach to AI Control

Authors

TL;DR

Abstract

Table of Contents

Figures (4)