Towards Reliable Social A/B Testing: Spillover-Contained Clustering with Robust Post-Experiment Analysis
Xu Min, Zhaoxu Yang, Kaixuan Tan, Juan Yan, Xunbin Xiong, Zihao Zhu, Kaiyu Zhu, Fenglin Cui, Yang Yang, Sihua Yang, Jianhui Bu
TL;DR
This work tackles network interference in social A/B testing by introducing a two-stage framework that combines spillover-contained design with robust post-experiment inference. It develops Balanced Louvain, a cluster-building method tailored to minimize cross-cluster spillovers while maintaining size balance and temporal stability, and couples it with CUPAC-based variance reduction to regain statistical power under cluster randomization. The approach is validated through simulations and large-scale Kuaishou experiments, showing substantial spillover containment and more accurate treatment-effect estimates than traditional user-level designs. The proposed workflow enables reliable, scalable networked experimentation with practical production deployments and clear implications for evaluating social strategies on large platforms.
Abstract
A/B testing is the foundation of decision-making in online platforms, yet social products often suffer from network interference: user interactions cause treatment effects to spill over into the control group. Such spillovers bias causal estimates and undermine experimental conclusions. Existing approaches face key limitations: user-level randomization ignores network structure, while cluster-based methods often rely on general-purpose clustering that is not tailored for spillover containment and has difficulty balancing unbiasedness and statistical power at scale. We propose a spillover-contained experimentation framework with two stages. In the pre-experiment stage, we build social interaction graphs and introduce a Balanced Louvain algorithm that produces stable, size-balanced clusters while minimizing cross-cluster edges, enabling reliable cluster-based randomization. In the post-experiment stage, we develop a tailored CUPAC estimator that leverages pre-experiment behavioral covariates to reduce the variance induced by cluster-level assignment, thereby improving statistical power. Together, these components provide both structural spillover containment and robust statistical inference. We validate our approach through large-scale social sharing experiments on Kuaishou, a platform serving hundreds of millions of users. Results show that our method substantially reduces spillover and yields more accurate assessments of social strategies than traditional user-level designs, establishing a reliable and scalable framework for networked A/B testing.
