Discrete World Models via Regularization

Davide Bizzaro; Luciano Serafini

Discrete World Models via Regularization

Davide Bizzaro, Luciano Serafini

TL;DR

This work introduces Discrete World Models via Regularization (DWMR): a reconstruction-free and contrastive-free method for unsupervised Boolean world-model learning that couples latent prediction with specialized regularizers.

Abstract

World models aim to capture the states and dynamics of an environment in a compact latent space. Moreover, using Boolean state representations is particularly useful for search heuristics and symbolic reasoning and planning. Existing approaches keep latents informative via decoder-based reconstruction, or instead via contrastive or reward signals. In this work, we introduce Discrete World Models via Regularization (DWMR): a reconstruction-free and contrastive-free method for unsupervised Boolean world-model learning. In particular, we introduce a novel world-modeling loss that couples latent prediction with specialized regularizers. Such regularizers maximize the entropy and independence of the representation bits through variance, correlation, and coskewness penalties, while simultaneously enforcing a locality prior for sparse action changes. To enable effective optimization, we also introduce a novel training scheme improving robustness to discrete roll-outs. Experiments on two benchmarks with underlying combinatorial structure show that DWMR learns more accurate representations and transitions than reconstruction-based alternatives. Finally, DWMR can also be paired with an auxiliary reconstruction decoder, and this combination yields additional gains.

Discrete World Models via Regularization

TL;DR

Abstract

Paper Structure (28 sections, 9 equations, 2 figures, 3 tables)

This paper contains 28 sections, 9 equations, 2 figures, 3 tables.

Introduction
Related Works
Discrete World Models
Joint-Embedding Predictive Architectures (JEPA)
Architecture and Training
Architecture
Loss Function
Prediction Loss $\mathcal{L}_{\text{pred}}$.
Variance Regularizer $\mathcal{L}_{\text{var}}$.
Correlation Regularizer $\mathcal{L}_{\text{cor}}$.
Coskewness Regularizer $\mathcal{L}_{\text{cos}}$.
Locality Regularizer $\mathcal{L}_{\text{loc}}$.
Training Procedure
Experiments
Benchmarks
...and 13 more sections

Figures (2)

Figure 1: Overview of the model architecture and of the loss function. Encoders map successive observations into a shared Boolean latent space, and a predictor transforms the current latent state into the next, given the action. We illustrate and evaluate this setup on an 8-puzzle benchmark with MNIST digits, where actions move the blank tile and induce local state changes and new renderings of the digits. The crossed lines denote operations that stop the gradient, and we introduce a training scheme with two-steps updates: initially, the predictor is updated solely based on hard bits $b$, followed by a joint update of the encoder and predictor using probabilities $p$. Test-time inference relies only on the pathway involving $b$. The parameters $\phi'$ are an EMA copy of the parameters $\phi$.
Figure 2: Example transition in IceSlider.

Discrete World Models via Regularization

TL;DR

Abstract

Discrete World Models via Regularization

Authors

TL;DR

Abstract

Table of Contents

Figures (2)