Table of Contents
Fetching ...

Cascade-based Randomization for Inferring Causal Effects under Diffusion Interference

Zahra Fatemi, Jean Pouget-Abadie, Elena Zheleva

TL;DR

This work addresses causal effect estimation under network interference caused by cascades. It introduces Cascade-Based Randomization (CasBR), which seeds treatment at cascade origins and propagates assignments to multi-hop neighbors to mitigate multi-hop interference under the Independent Cascade Model. Empirical results across real and synthetic networks show CasBR and its post-processing variant consistently achieve lower RMSE in total treatment effect estimation compared to cluster-based baselines, often by substantial margins. The approach highlights the value of leveraging cascade seed information to improve causal inference in diffusion-heavy networks and suggests directions for extending cascade-aware designs to other diffusion models and bias considerations.

Abstract

The presence of interference, where the outcome of an individual may depend on the treatment assignment and behavior of neighboring nodes, can lead to biased causal effect estimation. Current approaches to network experiment design focus on limiting interference through cluster-based randomization, in which clusters are identified using graph clustering, and cluster randomization dictates the node assignment to treatment and control. However, cluster-based randomization approaches perform poorly when interference propagates in cascades, whereby the response of individuals to treatment propagates to their multi-hop neighbors. When we have knowledge of the cascade seed nodes, we can leverage this interference structure to mitigate the resulting causal effect estimation bias. With this goal, we propose a cascade-based network experiment design that initiates treatment assignment from the cascade seed node and propagates the assignment to their multi-hop neighbors to limit interference during cascade growth and thereby reduce the overall causal effect estimation error. Our extensive experiments on real-world and synthetic datasets demonstrate that our proposed framework outperforms the existing state-of-the-art approaches in estimating causal effects in network data.

Cascade-based Randomization for Inferring Causal Effects under Diffusion Interference

TL;DR

This work addresses causal effect estimation under network interference caused by cascades. It introduces Cascade-Based Randomization (CasBR), which seeds treatment at cascade origins and propagates assignments to multi-hop neighbors to mitigate multi-hop interference under the Independent Cascade Model. Empirical results across real and synthetic networks show CasBR and its post-processing variant consistently achieve lower RMSE in total treatment effect estimation compared to cluster-based baselines, often by substantial margins. The approach highlights the value of leveraging cascade seed information to improve causal inference in diffusion-heavy networks and suggests directions for extending cascade-aware designs to other diffusion models and bias considerations.

Abstract

The presence of interference, where the outcome of an individual may depend on the treatment assignment and behavior of neighboring nodes, can lead to biased causal effect estimation. Current approaches to network experiment design focus on limiting interference through cluster-based randomization, in which clusters are identified using graph clustering, and cluster randomization dictates the node assignment to treatment and control. However, cluster-based randomization approaches perform poorly when interference propagates in cascades, whereby the response of individuals to treatment propagates to their multi-hop neighbors. When we have knowledge of the cascade seed nodes, we can leverage this interference structure to mitigate the resulting causal effect estimation bias. With this goal, we propose a cascade-based network experiment design that initiates treatment assignment from the cascade seed node and propagates the assignment to their multi-hop neighbors to limit interference during cascade growth and thereby reduce the overall causal effect estimation error. Our extensive experiments on real-world and synthetic datasets demonstrate that our proposed framework outperforms the existing state-of-the-art approaches in estimating causal effects in network data.
Paper Structure (21 sections, 6 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 21 sections, 6 equations, 7 figures, 2 tables, 2 algorithms.

Figures (7)

  • Figure 1: The resulting node assignment of three different network experiment designs for minimizing interference in the Independent Cascade Model. The CBR approach leaves more edges between treatment and control nodes compared to our CasBR method (9 vs. 5 edges) which indicates more interference in the experiment.
  • Figure 2: RMSE of total treatment effect estimation of different methods on real-world and synthetic datasets in the last time step of the cascade propagation; Bars show the standard deviation of the estimated effects. Seed nodes are selected using the random sampling method. In all datasets, $10\%$ of the nodes are considered as the cascade seeds. CasBR and CasBR-post achieve the best performance in all datasets.
  • Figure 3: RMSE of total treatment effect estimation of different methods in the last time step of the cascade propagation; Bars show the standard deviation of the estimated effects. Seed nodes are selected using the random sampling method. In all datasets, $10\%$ of the nodes are considered as the cascade seeds. We set $p_{t-t}=p_{t-c}=0.07$, and $p_{c-c}=p_{c-t}=0.05$.
  • Figure 4: RMSE of total treatment effect estimation of different methods in consecutive time steps of the cascade propagation; Bars show the standard deviation of the estimated effects. Seed nodes are selected using the random sampling method. CasBR and CasBR-post get the lowest estimation error in different time steps.
  • Figure 5: Comparison between RMSE of total treatment effect estimation of CasBR and CBR(reLDG); cascade seed nodes are selected using the random sampling approach. The number of clusters in CBR(reLDG) is equal to the number of cascade seeds.
  • ...and 2 more figures