DoubleMLDeep: Estimation of Causal Effects with Multimodal Data
Sven Klaassen, Jan Teichert-Kluge, Philipp Bach, Victor Chernozhukov, Martin Spindler, Suhas Vijaykumar
TL;DR
The paper addresses causal effect estimation when confounding includes unstructured multimodal data (text and images) by extending the partially linear regression and double machine learning framework to multimodal nuisances. It introduces a middle-fusion neural architecture to estimate the nuisance functions $l_0(X)=\mathbb{E}[Y|X]$ and $m_0(X)=\mathbb{E}[D|X]$ while leveraging Neyman orthogonality via an orthogonal score $\psi(W,\theta,\hat{\eta})$, ensuring root-$N$ consistency for $\hat{\theta}$. A semi-synthetic data generator based on tabular, text, and image datasets is developed to validate inference under controlled confounding, demonstrating substantial bias reduction compared to a tabular-only baseline. Empirical results show nuisance $r^2$ around 0.88–0.90 and treatment-effect estimates $\hat{\theta}$ near the true $\theta_0=0.5$ (vs. a biased baseline), suggesting that multimodal information can improve causal estimation in economics, marketing, medicine, and beyond. The work also outlines extensions to other nonparametric causal models and additional unstructured data types for future research.
Abstract
This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation in the presence of text and images as confounders. The proposed methods and architectures are evaluated on the semi-synthetic dataset and compared to standard approaches, highlighting the potential benefit of using text and images directly in causal studies. Our findings have implications for researchers and practitioners in economics, marketing, finance, medicine and data science in general who are interested in estimating causal quantities using non-traditional data.
