Table of Contents
Fetching ...

Generalization properties of contrastive world models

Kandan Ramakrishnan, R. James Cotton, Xaq Pitkow, Andreas S. Tolias

TL;DR

The paper addresses whether contrastive, object-centric world models can generalize to out-of-distribution data. It introduces an exhaustive evaluation of a contrastive structured world model (CSWM) across IID conditions and diverse OOD scenarios using 2D shapes, 3D blocks, and 3-body physics datasets. The key finding is that CSWM fails to maintain object-level factorization under OOD, with performance declines that scale with the extent of OOD and prediction horizon, and with visualizations showing mixed object representations and incorrect transition updates. This work highlights a fundamental limitation of current contrastive, slot-based world models and motivates the development of new architectures or learning paradigms that preserve factorization to achieve human-like generalization.

Abstract

Recent work on object-centric world models aim to factorize representations in terms of objects in a completely unsupervised or self-supervised manner. Such world models are hypothesized to be a key component to address the generalization problem. While self-supervision has shown improved performance however, OOD generalization has not been systematically and explicitly tested. In this paper, we conduct an extensive study on the generalization properties of contrastive world model. We systematically test the model under a number of different OOD generalization scenarios such as extrapolation to new object attributes, introducing new conjunctions or new attributes. Our experiments show that the contrastive world model fails to generalize under the different OOD tests and the drop in performance depends on the extent to which the samples are OOD. When visualizing the transition updates and convolutional feature maps, we observe that any changes in object attributes (such as previously unseen colors, shapes, or conjunctions of color and shape) breaks down the factorization of object representations. Overall, our work highlights the importance of object-centric representations for generalization and current models are limited in their capacity to learn such representations required for human-level generalization.

Generalization properties of contrastive world models

TL;DR

The paper addresses whether contrastive, object-centric world models can generalize to out-of-distribution data. It introduces an exhaustive evaluation of a contrastive structured world model (CSWM) across IID conditions and diverse OOD scenarios using 2D shapes, 3D blocks, and 3-body physics datasets. The key finding is that CSWM fails to maintain object-level factorization under OOD, with performance declines that scale with the extent of OOD and prediction horizon, and with visualizations showing mixed object representations and incorrect transition updates. This work highlights a fundamental limitation of current contrastive, slot-based world models and motivates the development of new architectures or learning paradigms that preserve factorization to achieve human-like generalization.

Abstract

Recent work on object-centric world models aim to factorize representations in terms of objects in a completely unsupervised or self-supervised manner. Such world models are hypothesized to be a key component to address the generalization problem. While self-supervision has shown improved performance however, OOD generalization has not been systematically and explicitly tested. In this paper, we conduct an extensive study on the generalization properties of contrastive world model. We systematically test the model under a number of different OOD generalization scenarios such as extrapolation to new object attributes, introducing new conjunctions or new attributes. Our experiments show that the contrastive world model fails to generalize under the different OOD tests and the drop in performance depends on the extent to which the samples are OOD. When visualizing the transition updates and convolutional feature maps, we observe that any changes in object attributes (such as previously unseen colors, shapes, or conjunctions of color and shape) breaks down the factorization of object representations. Overall, our work highlights the importance of object-centric representations for generalization and current models are limited in their capacity to learn such representations required for human-level generalization.
Paper Structure (11 sections, 4 figures)

This paper contains 11 sections, 4 figures.

Figures (4)

  • Figure 1: A) Model architecture of object-centric world model used in our experiments. The world model consists of an object encoder with slot architecture and followed by a Graph Neural Network as the transition model. B) Datasets used to evaluate models for out-of-distribution generalization. The grid based 2D shapes, 3D blocks and 3 body dataset are visualized. C) Illustration of the generalization tests on 2D shapes dataset - i) IID:training and testing data are the same, ii) New conjunction : Testing on novel color-shape combinations not seen during training, iii) Extrapolation : Testing for a new shape or new color different from the training dataset and iv) New dimension - Either variation in shape or color is seen during training while the testing contain variation in both shape and color.
  • Figure 2: Understanding the factorization of representations: A) visualization of the activation maps from the convolutional backbone on 2D shapes dataset of both CSWM and AE. Each map corresponds to an object from the input image which indicates to what extent the encoder is able to factorize the representation space as per objects. B) Visualization of convolutional maps on 3D blocks dataset of both CSWM and AE models. C) Each plot is the state transitions when only one object is moved in the environment.
  • Figure 3: Evaluation of CSWM model under OOD generalization. A) H@1 prediction performance of the model under different types and extent of OOD. B) Visualization of convolutional maps of the model corresponding to each generalization test. C) Transition updates corresponding to each generalization test.
  • Figure 4: OOD generalization performance of CSWM on 3D blocks and 3 body dataset.