Table of Contents
Fetching ...

Towards Reliable Social A/B Testing: Spillover-Contained Clustering with Robust Post-Experiment Analysis

Xu Min, Zhaoxu Yang, Kaixuan Tan, Juan Yan, Xunbin Xiong, Zihao Zhu, Kaiyu Zhu, Fenglin Cui, Yang Yang, Sihua Yang, Jianhui Bu

TL;DR

This work tackles network interference in social A/B testing by introducing a two-stage framework that combines spillover-contained design with robust post-experiment inference. It develops Balanced Louvain, a cluster-building method tailored to minimize cross-cluster spillovers while maintaining size balance and temporal stability, and couples it with CUPAC-based variance reduction to regain statistical power under cluster randomization. The approach is validated through simulations and large-scale Kuaishou experiments, showing substantial spillover containment and more accurate treatment-effect estimates than traditional user-level designs. The proposed workflow enables reliable, scalable networked experimentation with practical production deployments and clear implications for evaluating social strategies on large platforms.

Abstract

A/B testing is the foundation of decision-making in online platforms, yet social products often suffer from network interference: user interactions cause treatment effects to spill over into the control group. Such spillovers bias causal estimates and undermine experimental conclusions. Existing approaches face key limitations: user-level randomization ignores network structure, while cluster-based methods often rely on general-purpose clustering that is not tailored for spillover containment and has difficulty balancing unbiasedness and statistical power at scale. We propose a spillover-contained experimentation framework with two stages. In the pre-experiment stage, we build social interaction graphs and introduce a Balanced Louvain algorithm that produces stable, size-balanced clusters while minimizing cross-cluster edges, enabling reliable cluster-based randomization. In the post-experiment stage, we develop a tailored CUPAC estimator that leverages pre-experiment behavioral covariates to reduce the variance induced by cluster-level assignment, thereby improving statistical power. Together, these components provide both structural spillover containment and robust statistical inference. We validate our approach through large-scale social sharing experiments on Kuaishou, a platform serving hundreds of millions of users. Results show that our method substantially reduces spillover and yields more accurate assessments of social strategies than traditional user-level designs, establishing a reliable and scalable framework for networked A/B testing.

Towards Reliable Social A/B Testing: Spillover-Contained Clustering with Robust Post-Experiment Analysis

TL;DR

This work tackles network interference in social A/B testing by introducing a two-stage framework that combines spillover-contained design with robust post-experiment inference. It develops Balanced Louvain, a cluster-building method tailored to minimize cross-cluster spillovers while maintaining size balance and temporal stability, and couples it with CUPAC-based variance reduction to regain statistical power under cluster randomization. The approach is validated through simulations and large-scale Kuaishou experiments, showing substantial spillover containment and more accurate treatment-effect estimates than traditional user-level designs. The proposed workflow enables reliable, scalable networked experimentation with practical production deployments and clear implications for evaluating social strategies on large platforms.

Abstract

A/B testing is the foundation of decision-making in online platforms, yet social products often suffer from network interference: user interactions cause treatment effects to spill over into the control group. Such spillovers bias causal estimates and undermine experimental conclusions. Existing approaches face key limitations: user-level randomization ignores network structure, while cluster-based methods often rely on general-purpose clustering that is not tailored for spillover containment and has difficulty balancing unbiasedness and statistical power at scale. We propose a spillover-contained experimentation framework with two stages. In the pre-experiment stage, we build social interaction graphs and introduce a Balanced Louvain algorithm that produces stable, size-balanced clusters while minimizing cross-cluster edges, enabling reliable cluster-based randomization. In the post-experiment stage, we develop a tailored CUPAC estimator that leverages pre-experiment behavioral covariates to reduce the variance induced by cluster-level assignment, thereby improving statistical power. Together, these components provide both structural spillover containment and robust statistical inference. We validate our approach through large-scale social sharing experiments on Kuaishou, a platform serving hundreds of millions of users. Results show that our method substantially reduces spillover and yields more accurate assessments of social strategies than traditional user-level designs, establishing a reliable and scalable framework for networked A/B testing.
Paper Structure (60 sections, 15 equations, 5 figures, 8 tables, 2 algorithms)

This paper contains 60 sections, 15 equations, 5 figures, 8 tables, 2 algorithms.

Figures (5)

  • Figure 1: Spillover effects in social A/B testing: treated users influence control users through interactions (e.g., sharing), biasing effect estimates.
  • Figure 2: Overview of our proposed framework for reliable social A/B testing. The system consists of two main stages: (1) pre-experiment graph construction and clustering to minimize cross-cluster spillovers; (2) post-experiment cluster-based randomization and effect estimation with variance reduction (CUPAC) for more sensitive detection.
  • Figure 3: WGSR vs. observed ATE (left) and bias (right). Cluster-based assignment reduces bias by 87--89%. Horizontal dotted lines indicate the true ATE. Stars: extrapolation to WGSR$=$1.0 recovers true ATE within 0.1%.
  • Figure 4: Illustration of our joint cluster-based and uid-based experimental design. We first allocate part of the traffic according to CBR to contain spillover. The remaining traffic is then assigned using UBR, enabling a parallel comparison.
  • Figure 5: LT-7 lift comparison of cluster-based (CBR) vs. user-based (UBR) randomization under the sharing strategy.