The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology

Juan L. Gamella; Jonas Peters; Peter Bühlmann

The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology

Juan L. Gamella, Jonas Peters, Peter Bühlmann

TL;DR

The paper addresses the scarcity of real-ground-truth data for validating AI methods by introducing two automated physical testbeds, the Wind Tunnel and Light Tunnel, termed Causal Chambers. These devices embody well-understood physics and allow controlled interventions to generate large, multi-modal datasets with causal ground-truth graphs that serve as rigorous benchmarks. The authors provide ground-truth causal graphs, diverse data modalities, and open-source hardware/software plus downloadable datasets to support tasks across causal discovery, out-of-distribution generalization, change-point detection, ICA, symbolic regression, and physics-informed ML. Case studies reveal strengths and failure modes of leading algorithms on real-world-like data, highlighting validation utility and open science. By offering an accessible, open platform for validation, the work aims to accelerate robust methodological development and reproducibility in AI.

Abstract

In some fields of AI, machine learning and statistics, the validation of new methods and algorithms is often hindered by the scarcity of suitable real-world datasets. Researchers must often turn to simulated data, which yields limited information about the applicability of the proposed methods to real problems. As a step forward, we have constructed two devices that allow us to quickly and inexpensively produce large datasets from non-trivial but well-understood physical systems. The devices, which we call causal chambers, are computer-controlled laboratories that allow us to manipulate and measure an array of variables from these physical systems, providing a rich testbed for algorithms from a variety of fields. We illustrate potential applications through a series of case studies in fields such as causal discovery, out-of-distribution generalization, change point detection, independent component analysis, and symbolic regression. For applications to causal inference, the chambers allow us to carefully perform interventions. We also provide and empirically validate a causal model of each chamber, which can be used as ground truth for different tasks. All hardware and software is made open source, and the datasets are publicly available at causalchamber.org or through the Python package causalchamber.

The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology

TL;DR

Abstract

Paper Structure (15 sections, 6 figures)

This paper contains 15 sections, 6 figures.

Introduction
The Causal Chambers
The Wind Tunnel
The Light Tunnel
A Testbed for Algorithms
Causal Ground Truth
Case Studies
Causal discovery (\ref{['fig:benchmarks_1']}a)
Out-of-distribution generalization (\ref{['fig:benchmarks_1']}b)
Change point detection (\ref{['fig:benchmarks_1']}c)
Independent component analysis (\ref{['fig:benchmarks_2']}d)
Symbolic regression (\ref{['fig:benchmarks_2']}e)
Physics-informed machine learning (\ref{['fig:benchmarks_2']}f)
Discussion
Data availability

Figures (6)

Figure 1: Data collection workflow. The user provides an experiment protocol consisting of step-by-step instructions describing the data collection procedure, which the chamber then carries out without human supervision. The instructions specify when and to which values the actuators and sensor parameters should be set. They also specify when measurements of all variables should be taken and at which frequency, at a maximum of 10 Hz for the light tunnel and 7 Hz for the wind tunnel. Actuators and sensor parameters can also be set automatically by the chamber as a function of other variables in the system, such as sensor measurements. This allows introducing additional complexity for some validation tasks, as described in \ref{['s:testbed']}.
Figure 2: The causal chambers. (a) The wind tunnel. (b) The light tunnel with the front panel removed to show its inner components. (c,d) Diagrams of the chambers and their main components, including the amplification circuit that drives the speaker of the wind tunnel and the variables for the light tunnel camera. The variables measured by the chambers are displayed in black math print. Sensor measurements are denoted by a tilde. Manipulable variables, that is, actuators and sensor parameters, are shown in bold symbols (shown as non-bold text elsewhere in the text). A detailed description of each variable is given in \ref{['apx:chamber_variables']}.
Figure 3: Representation of the known effects for different chamber configurations. Bold symbols correspond to manipulable variables, such as actuators and sensor parameters (shown as non-bold text elsewhere in the text). Sensor measurements are denoted by a tilde. (a,c) Standard configurations of the chambers (b) "camera" configuration of the light tunnel, including images from the light tunnel ($\tilde{\text{I}}$m) and the camera parameters (Ap, ISO, $T_\text{Im}$). (d) "Pressure-control" configuration of the wind tunnel, where the load fans $L_\text{in},L_\text{out}$ are set by a control mechanism to maintain the chamber pressure $\tilde{P}_\text{dw}$ at a given level. Each effect (edge in the graph) is described in detail with additional experiments in \ref{['apx:physical_effects']}.
Figure 4: Examples of data produced by the chambers. (a) Numeric time-series data produced by the wind tunnel under an impulse on the intake fan load ($L_\text{in}$, red), affecting other variables in the system. (b,c) numerical data from the light tunnel illustrating the effect of LED brightness $(L_{11}, L_{12})$ and polarizer angles ($\theta_1, \theta_2$) on the light-intensity readings ($\tilde{I}_1, \tilde{I}_2, \tilde{I}_3$). (d) Effect of the light source setting $(R,G,B)$ on the light intensity reading of the first sensor ($\tilde{I}_1$) and drawn current ($\tilde{C}$). (e) Examples of images from the light tunnel for a fixed light source setting (reference) and interventions on other variables that affect the resulting image.
Figure 5: Validating algorithms using the chambers (1/2). (a) Causal discovery from light-tunnel and wind-tunnel data. The tasks consist of recovering the causal graph from observational data and interventional data from the light tunnel (tasks a1 and a2), and from time-series data from the wind tunnel (task a3). We run a suitable method for each task (GES chickering2002optimal, UT-IGSP with hyperparameter tuning squires2020permutation, and PCMCI+ runge2020discovering, respectively), and evaluate their performance in the recovery of the causal structure of the corresponding ground truth (see \ref{['ss:ground_truth']}). GES and PCMCI+ return a set of 12 and 5 plausible graphs, respectively, encoded by a graph with undirected edges chickering2002optimal. For these methods, we show the precision and recall in the recovery of the directed ground-truth edges (c.f. equation \ref{['eq:prec_recall']}, \ref{['apx:methods']}) for the best- (bold) and worst-scoring graph in each set. All the graphs returned by PCMCI+ attain the same scores, performing similarly to random guessing. (b) Evaluating the out-of-distribution performance of regression methods. For each task, we try to predict a sensor measurement or actuator value ($Y$) from predictors ($X$) such as numeric measurements (task b1), images (task b2), or impulse-response curves (task b3). We evaluate the predictive performance of each method in terms of its mean absolute error (MAE) on a separate validation set from the training distribution and shifted distributions arising from manipulating the chamber variables. We display the MAE with spider charts, where each axis corresponds to a different setting. As a baseline, we show the MAE incurred when using the average of $Y$ in the training set as prediction (black, dashed). For tasks b2 and b3, the MAE is averaged over 16 random initializations of the model, with error bands corresponding to $\pm1$ standard deviation. (c) Detecting change points in the time series of different sensor measurements. We change the intake fan load ($L_\text{in}$) at random time points while keeping all other actuators and sensor parameters constant. Because the load affects all the displayed sensors, we take these time points as ground truth (vertical dotted lines) and compare them with the output of the change point detection algorithm (black crosses).
...and 1 more figures

The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology

TL;DR

Abstract

The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology

Authors

TL;DR

Abstract

Table of Contents

Figures (6)