I See, Therefore I Do: Estimating Causal Effects for Image Treatments
Abhinav Thorat, Ravi Kolla, Niranjan Pedanekar
TL;DR
This work addresses estimating individualized causal effects when treatments are images by introducing NICE, a framework that learns joint representations of user covariates and image treatments and uses dedicated treatment-head networks to predict potential outcomes. The model optimizes a regression loss together with a maximum mean discrepancy (MMD) based regularizer to mitigate treatment assignment bias, yielding $\hat{y}_t^i$ via $\hat{y}_t^i = \pi_t\big(\Phi(\mathbf{x}_i), \Psi(\Lambda(I_t))\big)$ and training with $\mathcal{L}=\alpha\mathcal{L}_1+\beta\mathcal{L}_2$, where $\mathcal{L}_1$ is the observed-outcome loss and $\mathcal{L}_2$ encourages balanced joint embeddings. To evaluate NICE, the authors construct semi-synthetic datasets using PosterLens embeddings and generate potential outcomes with a multiplicative structure $y_i^t=c\tilde{y}_i^t d_i^t$, enabling zero-shot and skewed-treatment experiments. Across $k=4,8,16$ treatments, NICE consistently achieves lower root PEHE than adapted baselines, and shows robustness to unseen treatments via zero-shot evaluation. The results demonstrate the value of exploiting rich image-treatment information for accurate ITE estimation and have practical implications for personalization in content and product displays; future work includes scalability to larger multimodal settings and application to real-world data.
Abstract
Causal effect estimation under observational studies is challenging due to the lack of ground truth data and treatment assignment bias. Though various methods exist in literature for addressing this problem, most of them ignore multi-dimensional treatment information by considering it as scalar, either continuous or discrete. Recently, certain works have demonstrated the utility of this rich yet complex treatment information into the estimation process, resulting in better causal effect estimation. However, these works have been demonstrated on either graphs or textual treatments. There is a notable gap in existing literature in addressing higher dimensional data such as images that has a wide variety of applications. In this work, we propose a model named NICE (Network for Image treatments Causal effect Estimation), for estimating individual causal effects when treatments are images. NICE demonstrates an effective way to use the rich multidimensional information present in image treatments that helps in obtaining improved causal effect estimates. To evaluate the performance of NICE, we propose a novel semi-synthetic data simulation framework that generates potential outcomes when images serve as treatments. Empirical results on these datasets, under various setups including the zero-shot case, demonstrate that NICE significantly outperforms existing models that incorporate treatment information for causal effect estimation.
