Compete and Compose: Learning Independent Mechanisms for Modular World Models
Anson Lei, Frederik Nolte, Bernhard Schölkopf, Ingmar Posner
TL;DR
COMET introduces a modular world model that learns independent interaction mechanisms via a winner-takes-all competition and then reuses them through a composition module to adapt to novel environments with limited data. By factorising dynamics into concise, reusable primitives and training with competitive updates, COMET achieves interpretable mechanism disentanglement and improved sample efficiency on unseen domains with image-based observations. The approach demonstrates that selective mechanism activation enables data-efficient transfer, while maintaining competitive prediction performance and offering a path toward growing collections of interaction behaviours. Overall, COMET provides a principled step toward structured, interpretable world models with reusable components for continual learning and transfer.
Abstract
We present COmpetitive Mechanisms for Efficient Transfer (COMET), a modular world model which leverages reusable, independent mechanisms across different environments. COMET is trained on multiple environments with varying dynamics via a two-step process: competition and composition. This enables the model to recognise and learn transferable mechanisms. Specifically, in the competition phase, COMET is trained with a winner-takes-all gradient allocation, encouraging the emergence of independent mechanisms. These are then re-used in the composition phase, where COMET learns to re-compose learnt mechanisms in ways that capture the dynamics of intervened environments. In so doing, COMET explicitly reuses prior knowledge, enabling efficient and interpretable adaptation. We evaluate COMET on environments with image-based observations. In contrast to competitive baselines, we demonstrate that COMET captures recognisable mechanisms without supervision. Moreover, we show that COMET is able to adapt to new environments with varying numbers of objects with improved sample efficiency compared to more conventional finetuning approaches.
