MACAW: A Causal Generative Model for Medical Imaging
Vibujithan Vigneshwaran, Erik Ohara, Matthias Wilms, Nils Forkert
TL;DR
MACAW introduces a causal generative framework that embeds a structural causal graph into a single invertible normalizing flow via masked causal autoencoders (C-MADE). The model supports interventional sampling, counterfactual inference through abduction-action-prediction, and Bayesian classification, demonstrated on synthetic data and 23,692 UK Biobank brain MRI slices projected to 60 latent components. Results show accurate encoding of causal structure, realistic counterfactual generation, and meaningful brain aging–related changes, with age prediction evaluated using counterfactuals and Bayesian posterior ages. limitations include dependency on a predefined causal graph, scalability to 3D data, and higher computational cost for class-conditioned inference, pointing to future work in end-to-end dimensionality reduction and 3D extensions for broader clinical applicability.
Abstract
Although deep learning techniques show promising results for many neuroimaging tasks in research settings, they have not yet found widespread use in clinical scenarios. One of the reasons for this problem is that many machine learning models only identify correlations between the input images and the outputs of interest, which can lead to many practical problems, such as encoding of uninformative biases and reduced explainability. Thus, recent research is exploring if integrating a priori causal knowledge into deep learning models is a potential avenue to identify these problems. This work introduces a new causal generative architecture named Masked Causal Flow (MACAW) for neuroimaging applications. Within this context, three main contributions are described. First, a novel approach that integrates complex causal structures into normalizing flows is proposed. Second, counterfactual prediction is performed to identify the changes in effect variables associated with a cause variable. Finally, an explicit Bayesian inference for classification is derived and implemented, providing an inherent uncertainty estimation. The feasibility of the proposed method was first evaluated using synthetic data and then using MRI brain data from more than 23000 participants of the UK biobank study. The evaluation results show that the proposed method can (1) accurately encode causal reasoning and generate counterfactuals highlighting the structural changes in the brain known to be associated with aging, (2) accurately predict a subject's age from a single 2D MRI slice, and (3) generate new samples assuming other values for subject-specific indicators such as age, sex, and body mass index. The code for a toy dataset is available at the following link: https://github.com/vibujithan/macaw-2D.git.
