Table of Contents
Fetching ...

Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Rezende, Yoshua Bengio, Michael Mozer, Christopher Pal

TL;DR

This paper addresses the challenge of learning causal structure and high-level representations from pixel observations in model-based RL. It introduces a benchmark suite with physics and chemistry mini-environments that allow controlled manipulation of causal graphs and provides evaluation criteria beyond structural recovery, including intervention predictions, zero-shot transfer, and downstream RL performance. The experiments show that explicit structure and modularity in models—particularly modular networks and GNNs—improve causal induction and scale better to larger graphs, with contrastive training often enhancing long-horizon predictions. The work offers an open-source platform to systematically study causal learning in world models and argues that structural biases are a fruitful inductive direction for robust, transferable model-based control.

Abstract

Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise that the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e.g., number of nodes, sparsity, causal chain length, etc.). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.

Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

TL;DR

This paper addresses the challenge of learning causal structure and high-level representations from pixel observations in model-based RL. It introduces a benchmark suite with physics and chemistry mini-environments that allow controlled manipulation of causal graphs and provides evaluation criteria beyond structural recovery, including intervention predictions, zero-shot transfer, and downstream RL performance. The experiments show that explicit structure and modularity in models—particularly modular networks and GNNs—improve causal induction and scale better to larger graphs, with contrastive training often enhancing long-horizon predictions. The work offers an open-source platform to systematically study causal learning in world models and argues that structural biases are a fruitful inductive direction for robust, transferable model-based control.

Abstract

Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise that the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e.g., number of nodes, sparsity, causal chain length, etc.). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.

Paper Structure

This paper contains 41 sections, 9 equations, 27 figures, 18 tables.

Figures (27)

  • Figure 1: (a)-(d): Different aspects contributing to the complexity of causal graphs. (i), (ii): Difference between observational and interventional data. In RL setting, actions are interventions in the environment. The hammer denotes an intervention. Intervention on a variable not only affects its direct children, but also all reachable variables. Variables impacted by the intervention have a darker shade.
  • Figure 2: Illustration of the key features of the suite. Environments have objects that interact according to the underlying causal graph which can be based on a subset of objects' properties. An efficient model should be able to infer the high level causal variables from raw pixel data and learn the underlying causal graph through interactions between these high level causal variables.
  • Figure 3: Demonstration of the weighted-block pushing environment (left: observed, right: unobserved) along with the feasible generalizations that the setup provides.
  • Figure 4: Demonstration of the vanilla chemistry environment (left: ground truth causal graph and a sample from it - same sample shown to demonstrate the affect of interventions, right: the affect of interventions and how far they affect based on underlying causal graph)
  • Figure 5: All models have 3 components: encoder, decoder and transition model. The transition models can either be monolithic, modular model or graph neural networks (GNNs). Monothlic models don't have explicit structure. GNNs have factorized representation of variables. Modular models have factorized representation of both variables and directed edges to potentially model causal relationships, e.g. $A$ causing $B$.
  • ...and 22 more figures