A Nonparametric Bayes Approach to Online Activity Prediction

Mario Beraha; Lorenzo Masoero; Stefano Favaro; Thomas S. Richardson

A Nonparametric Bayes Approach to Online Activity Prediction

Mario Beraha, Lorenzo Masoero, Stefano Favaro, Thomas S. Richardson

TL;DR

This work develops a Bayesian nonparametric framework for online activity forecasting, addressing the prediction of the number of active users in a horizon $N_D$ and the time to reach a participation threshold $D_M$. It introduces two models under a stable Beta-scaled process (SB-SP) prior to capture heterogeneous user engagement via trait allocations, yielding closed-form marginal and posterior distributions that enable tractable posterior predictive calculations. The authors propose two complementary strategies to estimate $D_M$: inversion of a global prediction band and direct Monte Carlo sampling from the posterior of $D_M$, with an empirical Bayes approach to fit the prior. Empirical evaluation on synthetic data and 210 real-world AB tests demonstrates that the geometric variant GM often outperforms competitors and BM, highlighting the practical value for planning online experiments and resource allocation in digital platforms.

Abstract

Accurately predicting the onset of specific activities within defined timeframes holds significant importance in several applied contexts. In particular, accurate prediction of the number of future users that will be exposed to an intervention is an important piece of information for experimenters running online experiments (A/B tests). In this work, we propose a novel approach to predict the number of users that will be active in a given time period, as well as the temporal trajectory needed to attain a desired user participation threshold. We model user activity using a Bayesian nonparametric approach which allows us to capture the underlying heterogeneity in user engagement. We derive closed-form expressions for the number of new users expected in a given period, and a simple Monte Carlo algorithm targeting the posterior distribution of the number of days needed to attain a desired number of users; the latter is important for experimental planning. We illustrate the performance of our approach via several experiments on synthetic and real world data, in which we show that our novel method outperforms existing competitors.

A Nonparametric Bayes Approach to Online Activity Prediction

TL;DR

This work develops a Bayesian nonparametric framework for online activity forecasting, addressing the prediction of the number of active users in a horizon

and the time to reach a participation threshold

. It introduces two models under a stable Beta-scaled process (SB-SP) prior to capture heterogeneous user engagement via trait allocations, yielding closed-form marginal and posterior distributions that enable tractable posterior predictive calculations. The authors propose two complementary strategies to estimate

: inversion of a global prediction band and direct Monte Carlo sampling from the posterior of

, with an empirical Bayes approach to fit the prior. Empirical evaluation on synthetic data and 210 real-world AB tests demonstrates that the geometric variant GM often outperforms competitors and BM, highlighting the practical value for planning online experiments and resource allocation in digital platforms.

Abstract

Paper Structure (30 sections, 8 theorems, 39 equations, 5 figures, 3 algorithms)

This paper contains 30 sections, 8 theorems, 39 equations, 5 figures, 3 algorithms.

Introduction
Two BNP Models for Activity Prediction
Completely Random Measure Priors and their limitations
The Stable Beta-Scaled Process Prior
Bayesian analysis under the SB-SP Prior
Activity Prediction
Numerical Illustrations
Comparison between the two models
Estimation of $D_M$
Comparison on Real Datasets
Discussion
Background Material on Bayesian nonparametrics and Random Measures
Proofs
Proof of Theorem \ref{['thm:postmod1']}
Proof of Theorem \ref{['thm:postmod2']}
...and 15 more sections

Key Result

Theorem 1

Let $Z_1, \ldots, Z_d$ be distributed according to eq:bnp1. Denote by $\omega^*_1, \ldots, \omega^*_{N_d}$ the observed user-specific labels, and let $M_i = \sum_{j=1}^d Z_{j, i}$. Then, the marginal distribution of $Z_1, \ldots, Z_d$ is where $B(\cdot, \cdot)$ is the beta function and $\gamma_d = \alpha \sum_{i=1}^d B(1 - \alpha, i)$. Moreover, the posterior distribution of $\tilde{\mu}$ coincid

Figures (5)

Figure 1: Inversion technique to estimate $D_M$. The solid black line represent the data. Dashed black line and gray area are the mean of \ref{['eq:pred_nd']} and the global credible band. The interval for $D_M$ (orange curly bracket) is obtained by slicing the grey area at $M$.
Figure 2: Absolute and relative errors for the prediction of $N_D$ in the settings described in Section \ref{['sec:model_comparison']}.
Figure 3: Length of the prediction intervals for $D_M$ for 500 simulated datasets according to the simulation in Section \ref{['sec:interval_comparison']}. Different plots correspond to different values of $M$. In each plot the tail parameter of the data generating process varies across the $x$-axis.
Figure 4: Ranking of the models on the 210 real datasets analyzed in Section \ref{['sec:real_data']}.
Figure 5: Survival function for the prediction accuracy (left) and boxplots (right) for different methods on the 210 real datasets analyzed in Section \ref{['sec:real_data']}.

Theorems & Definitions (9)

Definition 1
Theorem 1
Theorem 2
Proposition 1
Theorem 3
Proposition 2: Marginal law, Proposition 3.1 in Jam(17)
Theorem 4: Posterior law, Theorem 3.1 in Jam(17)
Proposition 3: Compound Poisson representation, Proposition 3.3 in Jam(17)
Lemma 1

A Nonparametric Bayes Approach to Online Activity Prediction

TL;DR

Abstract

A Nonparametric Bayes Approach to Online Activity Prediction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (9)