Table of Contents
Fetching ...

Compete and Compose: Learning Independent Mechanisms for Modular World Models

Anson Lei, Frederik Nolte, Bernhard Schölkopf, Ingmar Posner

TL;DR

COMET introduces a modular world model that learns independent interaction mechanisms via a winner-takes-all competition and then reuses them through a composition module to adapt to novel environments with limited data. By factorising dynamics into concise, reusable primitives and training with competitive updates, COMET achieves interpretable mechanism disentanglement and improved sample efficiency on unseen domains with image-based observations. The approach demonstrates that selective mechanism activation enables data-efficient transfer, while maintaining competitive prediction performance and offering a path toward growing collections of interaction behaviours. Overall, COMET provides a principled step toward structured, interpretable world models with reusable components for continual learning and transfer.

Abstract

We present COmpetitive Mechanisms for Efficient Transfer (COMET), a modular world model which leverages reusable, independent mechanisms across different environments. COMET is trained on multiple environments with varying dynamics via a two-step process: competition and composition. This enables the model to recognise and learn transferable mechanisms. Specifically, in the competition phase, COMET is trained with a winner-takes-all gradient allocation, encouraging the emergence of independent mechanisms. These are then re-used in the composition phase, where COMET learns to re-compose learnt mechanisms in ways that capture the dynamics of intervened environments. In so doing, COMET explicitly reuses prior knowledge, enabling efficient and interpretable adaptation. We evaluate COMET on environments with image-based observations. In contrast to competitive baselines, we demonstrate that COMET captures recognisable mechanisms without supervision. Moreover, we show that COMET is able to adapt to new environments with varying numbers of objects with improved sample efficiency compared to more conventional finetuning approaches.

Compete and Compose: Learning Independent Mechanisms for Modular World Models

TL;DR

COMET introduces a modular world model that learns independent interaction mechanisms via a winner-takes-all competition and then reuses them through a composition module to adapt to novel environments with limited data. By factorising dynamics into concise, reusable primitives and training with competitive updates, COMET achieves interpretable mechanism disentanglement and improved sample efficiency on unseen domains with image-based observations. The approach demonstrates that selective mechanism activation enables data-efficient transfer, while maintaining competitive prediction performance and offering a path toward growing collections of interaction behaviours. Overall, COMET provides a principled step toward structured, interpretable world models with reusable components for continual learning and transfer.

Abstract

We present COmpetitive Mechanisms for Efficient Transfer (COMET), a modular world model which leverages reusable, independent mechanisms across different environments. COMET is trained on multiple environments with varying dynamics via a two-step process: competition and composition. This enables the model to recognise and learn transferable mechanisms. Specifically, in the competition phase, COMET is trained with a winner-takes-all gradient allocation, encouraging the emergence of independent mechanisms. These are then re-used in the composition phase, where COMET learns to re-compose learnt mechanisms in ways that capture the dynamics of intervened environments. In so doing, COMET explicitly reuses prior knowledge, enabling efficient and interpretable adaptation. We evaluate COMET on environments with image-based observations. In contrast to competitive baselines, we demonstrate that COMET captures recognisable mechanisms without supervision. Moreover, we show that COMET is able to adapt to new environments with varying numbers of objects with improved sample efficiency compared to more conventional finetuning approaches.
Paper Structure (38 sections, 5 equations, 9 figures, 3 tables)

This paper contains 38 sections, 5 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: In the competition phase, predictions are made using all possible mechanism-context pairs for each object. Gradients are only allocated to the mechanism-context pair which produces the most accurate prediction. This encourages specialisation within the mechanisms and enables learning from environments with varying dynamics. The figure describes the prediction step for a single object.
  • Figure 2: Disentanglement plots showing the correlation between mechanisms chosen by the models and ground-truth interaction modes. In the ideal case, the matrices should look like permutation matrices. Here, COMET is able to learn disentangled mechanisms that correspond to ground-truth behaviours in all three domains, as indicated by the fact that each interaction mode has one main corresponding learnt mechanism. In contrast, NPS does not exhibit the same structure.
  • Figure 3: Rollout errors (lower is better) in unseen environments with optimal mechanism selection. Shaded areas indicates the standard error of the mean. The lower errors indicate that COMET mechanisms can be readily reused across environments without finetuning.
  • Figure 4: Qualitative rollouts. The colour of the tabs on the bottom of each frame indicates the 'winning' mechanism at each time step. Across all environments, the competition winner changes as the underlying interaction mode changes. Top: The particles repel each other when they are close (blue) and moves independently when they are apart (green). Middle: In this traffic environment, the orange car obeys a slower speed limit and always pick the slow mechanism (orange). The blue car approaches the red light with normal driving (pink) $\rightarrow$ slow down (orange) $\rightarrow$ stop (green). Note that the orange mechanism is used as slow driving for both cars. Bottom: The player first wait to receive the ball (pink) and the moves towards opponent goal when in pocession of the ball (orange).
  • Figure 5: The average rollout error in an unseen environment with different amount of observed data in the new environment (lower is better). In all environments, all models eventually converge to similar errors given enough data. We show this explicitly in App \ref{['app:extra_rollouts']} and offer further discussion. In terms of sample efficiency, in the Particle Interactions and Traffic domains, COMET is able to achieve lower errors with few adaptation episodes. This means that COMET can learn to use the correct mechanisms with a small amount of data, thus corroborates our hypothesis that composing learnt mechanisms enables sample-efficient transfer. In the Team Sports domain, NPS is not able to generate stable rollouts with the amounts of adaptation episodes shown in the plots. The dotted line indicates the performance of NPS when trained with a large amount of data. Shaded areas represent the standard errors of the mean.
  • ...and 4 more figures