Table of Contents
Fetching ...

From monoliths to modules: Decomposing transducers for efficient world modelling

Alexander Boyd, Franz Nowak, David Hyland, Manuel Baltieri, Fernando E. Rosas

TL;DR

The paper develops an information-theoretic framework for representing, composing, and decomposing modular world models as networks of transducers, enabling both expressive power and structural interpretability. It introduces interfaces and transducers, analyzes their composition (including Kronecker-structured operators), and provides two factorization routes: one using latent variables and another relying on observable acausality. The authors propose algorithms for peeling monolithic transducers into prime modules, characterize coarse-graining and multiscale reductions, and connect minimal predictive representations (epsilon-transducers) to modular decomposition, with implications for scalable, parallelizable inference and AI safety. While latents-free factoring and causality-based methods are conceptually compelling, practical realization requires handling long histories and non-stationarity, marking fruitful directions for empirical validation and extension to feedback-rich settings.

Abstract

World models have been recently proposed as sandbox environments in which AI agents can be trained and evaluated before deployment. Although realistic world models often have high computational demands, efficient modelling is usually possible by exploiting the fact that real-world scenarios tend to involve subcomponents that interact in a modular manner. In this paper, we explore this idea by developing a framework for decomposing complex world models represented by transducers, a class of models generalising POMDPs. Whereas the composition of transducers is well understood, our results clarify how to invert this process, deriving sub-transducers operating on distinct input-output subspaces, enabling parallelizable and interpretable alternatives to monolithic world modelling that can support distributed inference. Overall, these results lay a groundwork for bridging the structural transparency demanded by AI safety and the computational efficiency required for real-world inference.

From monoliths to modules: Decomposing transducers for efficient world modelling

TL;DR

The paper develops an information-theoretic framework for representing, composing, and decomposing modular world models as networks of transducers, enabling both expressive power and structural interpretability. It introduces interfaces and transducers, analyzes their composition (including Kronecker-structured operators), and provides two factorization routes: one using latent variables and another relying on observable acausality. The authors propose algorithms for peeling monolithic transducers into prime modules, characterize coarse-graining and multiscale reductions, and connect minimal predictive representations (epsilon-transducers) to modular decomposition, with implications for scalable, parallelizable inference and AI safety. While latents-free factoring and causality-based methods are conceptually compelling, practical realization requires handling long histories and non-stationarity, marking fruitful directions for empirical validation and extension to feedback-rich settings.

Abstract

World models have been recently proposed as sandbox environments in which AI agents can be trained and evaluated before deployment. Although realistic world models often have high computational demands, efficient modelling is usually possible by exploiting the fact that real-world scenarios tend to involve subcomponents that interact in a modular manner. In this paper, we explore this idea by developing a framework for decomposing complex world models represented by transducers, a class of models generalising POMDPs. Whereas the composition of transducers is well understood, our results clarify how to invert this process, deriving sub-transducers operating on distinct input-output subspaces, enabling parallelizable and interpretable alternatives to monolithic world modelling that can support distributed inference. Overall, these results lay a groundwork for bridging the structural transparency demanded by AI safety and the computational efficiency required for real-world inference.

Paper Structure

This paper contains 31 sections, 7 theorems, 33 equations, 11 figures, 2 algorithms.

Key Result

Theorem 1

A collection of conditional distributions of the form $\Pr(Y_{0:t}=y_{0:t}|X_{0:t}=x_{0:t})$ constitutes a causal interface iff it has a transducer presentation.

Figures (11)

  • Figure 1: In this work, we first present a method for composing stochastic environments into larger ones. We then use this framework to identify procedures that reverse the process, decomposing complex environments into simpler, modular subcomponents.
  • Figure 2: A general interface and two illustrations of a causal interface: An unravelled general interface (a) takes a semi-infinite sequence of inputs $X_0,X_1,X_2 \cdots$ and stochastically transforms it to a semi-infinite sequence of outputs $Y_0,Y_1,Y_2 \cdots$, without any constraints on dependencies between inputs and outputs. An unravelled causal interface (b) shows individual inputs $X_t$ and outputs $Y_t$ unravelled into a semi-infinite sequence in time. The same object can be condensed (c) into a mapping from input pasts $\overleftarrow{X}_t$ and futures $\overrightarrow{X}_t$ to output pasts $\overleftarrow{Y}_t$ and futures $\overrightarrow{Y}_t$.
  • Figure 3: A transducer is a general model that transforms an input process $X$ to an output process $Y$ using a latent process $R$ as memory, which can be used to generate interfaces. The network representation draws an arrow from the input process $X$ with unknown source ($\frac{X}{}$) to the output process $Y$ with latent variable $R$ ($\frac{Y}{R}$). The circuit element representation takes two inputs ($X_t$ and $R_t$) to two outputs ($Y_t$ and $R_{t+1}$). The interface circuit representation exhibits multiple timepoints resulting from applying the circuit element in series.
  • Figure 4: The network in the top left shows the most general way of composing two transducers, $T$ with latent states $R$, inputs $X$, and outputs $Y$, and U with latent states $S$, inputs $XY$, and outputs $Z$. A circuit element that implements this composite transducer is shown on the top right, with time proceeding from left to right. The composite transducer $V$, when applied in sequence at the bottom, produces the interface from $X$ to $YZ$.
  • Figure 5: Lattice of sub-classes of transducer composition, ordered according to the number of restrictions they consider. Pruning edges from left to right corresponds to limiting dependencies on the inputs --- these limitations are enumerated in conditions 1 through 5. The arrows with dotted lines are labelled with the number of the condition that is necessary to prune each edge. In this lattice, we highlight three notable cases: (a) series, (b) convergent, and (c) parallel composition.
  • ...and 6 more figures

Theorems & Definitions (22)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Definition 3
  • Proposition 1
  • Corollary 1
  • Proposition 2
  • Definition 4
  • Lemma 1
  • proof
  • ...and 12 more