The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data

Mathieu Chevalley; Jacob Sackett-Sanders; Yusuf Roohani; Pascal Notin; Artemy Bakulin; Dariusz Brzezinski; Kaiwen Deng; Yuanfang Guan; Justin Hong; Michael Ibrahim; Wojciech Kotlowski; Marcin Kowiel; Panagiotis Misiakos; Achille Nazaret; Markus Püschel; Chris Wendler; Arash Mehrjou; Patrick Schwab

The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data

Mathieu Chevalley, Jacob Sackett-Sanders, Yusuf Roohani, Pascal Notin, Artemy Bakulin, Dariusz Brzezinski, Kaiwen Deng, Yuanfang Guan, Justin Hong, Michael Ibrahim, Wojciech Kotlowski, Marcin Kowiel, Panagiotis Misiakos, Achille Nazaret, Markus Püschel, Chris Wendler, Arash Mehrjou, Patrick Schwab

TL;DR

The paper introduces the CBC2023 challenge, a community effort to advance gene network inference from real-world single-cell perturbation data using the CausalBench benchmark. It analyzes submissions that fuse interventional signals with causal-principled reasoning, including BetterBoost, Guanlab, SparseRC, MeanDifference, and CATRAN, and demonstrates significant gains in leveraging interventional data and in balancing distributional-change metrics. By standardizing evaluation on large-scale perturbational datasets (RPE1 and K562) with both observational and interventional data, the work reveals that current methods can surpass prior baselines and that causality-grounded approaches hold particular promise under partial perturbations. The findings highlight the potential impact on drug discovery and causal genomics, encouraging broader participation and continued methodological innovation in real-world causal network inference.

Abstract

In drug discovery, mapping interactions between genes within cellular systems is a crucial early step. Such maps are not only foundational for understanding the molecular mechanisms underlying disease biology but also pivotal for formulating hypotheses about potential targets for new medicines. Recognizing the need to elevate the construction of these gene-gene interaction networks, especially from large-scale, real-world datasets of perturbed single cells, the CausalBench Challenge was initiated. This challenge aimed to inspire the machine learning community to enhance state-of-the-art methods, emphasizing better utilization of expansive genetic perturbation data. Using the framework provided by the CausalBench benchmark, participants were tasked with refining the current methodologies or proposing new ones. This report provides an analysis and summary of the methods submitted during the challenge to give a partial image of the state of the art at the time of the challenge. Notably, the winning solutions significantly improved performance compared to previous baselines, establishing a new state of the art for this critical task in biology and medicine.

The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data

TL;DR

Abstract

Paper Structure (21 sections, 8 equations, 2 figures, 5 tables)

This paper contains 21 sections, 8 equations, 2 figures, 5 tables.

Introduction
Benchmark Setup
Datasets
Task
Metrics
The CausalBench Challenge
Limitations
Methods
BetterBoost - Inference of Gene Regulatory Networks with Perturbation Data
Guanlab: A Supervised LightGBM-Based Approach
SparseRC: Learning Gene Regulatory Networks under Few-Root-Causes assumption
Differences in Mean Expression
CATRAN: Causal Transformer
Other proposed methods
Results and analysis
...and 6 more sections

Figures (2)

Figure 1: Performance comparison in terms of Mean Wasserstein Distance (unitless; y-axis) (top row) when varying the fraction of the full dataset size available for inference (in %; x-axis), and (bottom row) when varying the fraction of the full intervention set used (in %, x-axis). Markers indicate the values observed when running the respective algorithms with each of three random seeds , and colored lines indicate the median value observed across all tested random seeds for a method.
Figure 2: Performance comparison in terms of Precision (in %; y-axis) and Recall (in %; x-axis) in correctly identifying edges substantiated by biological interaction databases (left panels); and our own statistical evaluation using interventional information in terms of Wasserstein distance and FOR (right panels). For each method, we show the mean and standard deviation from three independent runs. Baseline methods are in green, and the challenge methods are in pink.

The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data

TL;DR

Abstract

The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data

Authors

TL;DR

Abstract

Table of Contents

Figures (2)