The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data
Mathieu Chevalley, Jacob Sackett-Sanders, Yusuf Roohani, Pascal Notin, Artemy Bakulin, Dariusz Brzezinski, Kaiwen Deng, Yuanfang Guan, Justin Hong, Michael Ibrahim, Wojciech Kotlowski, Marcin Kowiel, Panagiotis Misiakos, Achille Nazaret, Markus Püschel, Chris Wendler, Arash Mehrjou, Patrick Schwab
TL;DR
The paper introduces the CBC2023 challenge, a community effort to advance gene network inference from real-world single-cell perturbation data using the CausalBench benchmark. It analyzes submissions that fuse interventional signals with causal-principled reasoning, including BetterBoost, Guanlab, SparseRC, MeanDifference, and CATRAN, and demonstrates significant gains in leveraging interventional data and in balancing distributional-change metrics. By standardizing evaluation on large-scale perturbational datasets (RPE1 and K562) with both observational and interventional data, the work reveals that current methods can surpass prior baselines and that causality-grounded approaches hold particular promise under partial perturbations. The findings highlight the potential impact on drug discovery and causal genomics, encouraging broader participation and continued methodological innovation in real-world causal network inference.
Abstract
In drug discovery, mapping interactions between genes within cellular systems is a crucial early step. Such maps are not only foundational for understanding the molecular mechanisms underlying disease biology but also pivotal for formulating hypotheses about potential targets for new medicines. Recognizing the need to elevate the construction of these gene-gene interaction networks, especially from large-scale, real-world datasets of perturbed single cells, the CausalBench Challenge was initiated. This challenge aimed to inspire the machine learning community to enhance state-of-the-art methods, emphasizing better utilization of expansive genetic perturbation data. Using the framework provided by the CausalBench benchmark, participants were tasked with refining the current methodologies or proposing new ones. This report provides an analysis and summary of the methods submitted during the challenge to give a partial image of the state of the art at the time of the challenge. Notably, the winning solutions significantly improved performance compared to previous baselines, establishing a new state of the art for this critical task in biology and medicine.
