Compete and Compose: Learning Independent Mechanisms for Modular World Models

Anson Lei; Frederik Nolte; Bernhard Schölkopf; Ingmar Posner

Compete and Compose: Learning Independent Mechanisms for Modular World Models

Anson Lei, Frederik Nolte, Bernhard Schölkopf, Ingmar Posner

TL;DR

COMET introduces a modular world model that learns independent interaction mechanisms via a winner-takes-all competition and then reuses them through a composition module to adapt to novel environments with limited data. By factorising dynamics into concise, reusable primitives and training with competitive updates, COMET achieves interpretable mechanism disentanglement and improved sample efficiency on unseen domains with image-based observations. The approach demonstrates that selective mechanism activation enables data-efficient transfer, while maintaining competitive prediction performance and offering a path toward growing collections of interaction behaviours. Overall, COMET provides a principled step toward structured, interpretable world models with reusable components for continual learning and transfer.

Abstract

We present COmpetitive Mechanisms for Efficient Transfer (COMET), a modular world model which leverages reusable, independent mechanisms across different environments. COMET is trained on multiple environments with varying dynamics via a two-step process: competition and composition. This enables the model to recognise and learn transferable mechanisms. Specifically, in the competition phase, COMET is trained with a winner-takes-all gradient allocation, encouraging the emergence of independent mechanisms. These are then re-used in the composition phase, where COMET learns to re-compose learnt mechanisms in ways that capture the dynamics of intervened environments. In so doing, COMET explicitly reuses prior knowledge, enabling efficient and interpretable adaptation. We evaluate COMET on environments with image-based observations. In contrast to competitive baselines, we demonstrate that COMET captures recognisable mechanisms without supervision. Moreover, we show that COMET is able to adapt to new environments with varying numbers of objects with improved sample efficiency compared to more conventional finetuning approaches.

Compete and Compose: Learning Independent Mechanisms for Modular World Models

TL;DR

Abstract

Paper Structure (38 sections, 5 equations, 9 figures, 3 tables)

This paper contains 38 sections, 5 equations, 9 figures, 3 tables.

Related Work
Object-Centric Representations
Mechanism-based Models
Competition of Experts
COMET: COmpetitive Mechanisms for Efficient Transfer
Problem Setup and Model Architecture
Phase 1: Learning Reusable Mechanisms via Competition
Phase 2: Learning to Compose Mechanisms in New Environments
Experiments
Experimental Setup
Baselines.
Datasets.
Disentanglement of Mechanisms
Adaptation Efficiency
Limitations
...and 23 more sections

Figures (9)

Figure 1: In the competition phase, predictions are made using all possible mechanism-context pairs for each object. Gradients are only allocated to the mechanism-context pair which produces the most accurate prediction. This encourages specialisation within the mechanisms and enables learning from environments with varying dynamics. The figure describes the prediction step for a single object.
Figure 2: Disentanglement plots showing the correlation between mechanisms chosen by the models and ground-truth interaction modes. In the ideal case, the matrices should look like permutation matrices. Here, COMET is able to learn disentangled mechanisms that correspond to ground-truth behaviours in all three domains, as indicated by the fact that each interaction mode has one main corresponding learnt mechanism. In contrast, NPS does not exhibit the same structure.
Figure 3: Rollout errors (lower is better) in unseen environments with optimal mechanism selection. Shaded areas indicates the standard error of the mean. The lower errors indicate that COMET mechanisms can be readily reused across environments without finetuning.
Figure 4: Qualitative rollouts. The colour of the tabs on the bottom of each frame indicates the 'winning' mechanism at each time step. Across all environments, the competition winner changes as the underlying interaction mode changes. Top: The particles repel each other when they are close (blue) and moves independently when they are apart (green). Middle: In this traffic environment, the orange car obeys a slower speed limit and always pick the slow mechanism (orange). The blue car approaches the red light with normal driving (pink) $\rightarrow$ slow down (orange) $\rightarrow$ stop (green). Note that the orange mechanism is used as slow driving for both cars. Bottom: The player first wait to receive the ball (pink) and the moves towards opponent goal when in pocession of the ball (orange).
Figure 5: The average rollout error in an unseen environment with different amount of observed data in the new environment (lower is better). In all environments, all models eventually converge to similar errors given enough data. We show this explicitly in App \ref{['app:extra_rollouts']} and offer further discussion. In terms of sample efficiency, in the Particle Interactions and Traffic domains, COMET is able to achieve lower errors with few adaptation episodes. This means that COMET can learn to use the correct mechanisms with a small amount of data, thus corroborates our hypothesis that composing learnt mechanisms enables sample-efficient transfer. In the Team Sports domain, NPS is not able to generate stable rollouts with the amounts of adaptation episodes shown in the plots. The dotted line indicates the performance of NPS when trained with a large amount of data. Shaded areas represent the standard errors of the mean.
...and 4 more figures

Compete and Compose: Learning Independent Mechanisms for Modular World Models

TL;DR

Abstract

Compete and Compose: Learning Independent Mechanisms for Modular World Models

Authors

TL;DR

Abstract

Table of Contents

Figures (9)