Cascade-based Randomization for Inferring Causal Effects under Diffusion Interference
Zahra Fatemi, Jean Pouget-Abadie, Elena Zheleva
TL;DR
This work addresses causal effect estimation under network interference caused by cascades. It introduces Cascade-Based Randomization (CasBR), which seeds treatment at cascade origins and propagates assignments to multi-hop neighbors to mitigate multi-hop interference under the Independent Cascade Model. Empirical results across real and synthetic networks show CasBR and its post-processing variant consistently achieve lower RMSE in total treatment effect estimation compared to cluster-based baselines, often by substantial margins. The approach highlights the value of leveraging cascade seed information to improve causal inference in diffusion-heavy networks and suggests directions for extending cascade-aware designs to other diffusion models and bias considerations.
Abstract
The presence of interference, where the outcome of an individual may depend on the treatment assignment and behavior of neighboring nodes, can lead to biased causal effect estimation. Current approaches to network experiment design focus on limiting interference through cluster-based randomization, in which clusters are identified using graph clustering, and cluster randomization dictates the node assignment to treatment and control. However, cluster-based randomization approaches perform poorly when interference propagates in cascades, whereby the response of individuals to treatment propagates to their multi-hop neighbors. When we have knowledge of the cascade seed nodes, we can leverage this interference structure to mitigate the resulting causal effect estimation bias. With this goal, we propose a cascade-based network experiment design that initiates treatment assignment from the cascade seed node and propagates the assignment to their multi-hop neighbors to limit interference during cascade growth and thereby reduce the overall causal effect estimation error. Our extensive experiments on real-world and synthetic datasets demonstrate that our proposed framework outperforms the existing state-of-the-art approaches in estimating causal effects in network data.
