A Two-Part Machine Learning Approach to Characterizing Network Interference in A/B Testing
Yuan Yuan, Kristen M. Altenburger
TL;DR
This work tackles the bias and variance challenges caused by network interference in A/B testing by proposing a two-part machine learning framework that automates exposure mapping. It combines causal network motifs (g) to form motif representations and a clustering-based mapper (h) to define exposure regions, enabling robust estimation of average potential outcomes and the global average treatment effect with Horvitz–Thompson or Hájek estimators. The method is validated through synthetic Watts-Strogatz and Slashdot networks and a large-scale Instagram A/B test, showing reduced bias and improved inference relative to traditional design-based and exposure-mapping approaches. Practically, the approach offers a scalable, interpretable, and automated tool for practitioners to diagnose interference patterns, inform experiment design, and improve decision-making in marketing and product optimization.
Abstract
The reliability of controlled experiments, commonly referred to as "A/B tests," is often compromised by network interference, where the outcomes of individual units are influenced by interactions with others. Significant challenges in this domain include the lack of accounting for complex social network structures and the difficulty in suitably characterizing network interference. To address these challenges, we propose a machine learning-based method. We introduce "causal network motifs" and utilize transparent machine learning models to characterize network interference patterns underlying an A/B test on networks. Our method's performance has been demonstrated through simulations on both a synthetic experiment and a large-scale test on Instagram. Our experiments show that our approach outperforms conventional methods such as design-based cluster randomization and conventional analysis-based neighborhood exposure mapping. Our approach provides a comprehensive and automated solution to address network interference for A/B testing practitioners. This aids in informing strategic business decisions in areas such as marketing effectiveness and product customization.
