Table of Contents
Fetching ...

Model-Free Approximate Bayesian Learning for Large-Scale Conversion Funnel Optimization

Garud Iyengar, Raghav Singal

TL;DR

This work proposes a novel attribution-based decision-making algorithm that inherits the interpretability and scalability of Thompson sampling for bandits and maintains an approximate belief over the value of each state-specific intervention that significantly outperforms traditional approaches on extensive simulations calibrated to a real-world email marketing dataset.

Abstract

The flexibility of choosing the ad action as a function of the consumer state is critical for modern-day marketing campaigns. We study the problem of identifying the optimal sequential personalized interventions that maximize the adoption probability for a new product. We model consumer behavior by a conversion funnel that captures the state of each consumer (e.g., interaction history with the firm) and allows the consumer behavior to vary as a function of both her state and firm's sequential interventions. We show our model captures consumer behavior with very high accuracy (out-of-sample AUC of over 0.95) in a real-world email marketing dataset. However, it results in a very large-scale learning problem, where the firm must learn the state-specific effects of various interventions from consumer interactions. We propose a novel attribution-based decision-making algorithm for this problem that we call model-free approximate Bayesian learning. Our algorithm inherits the interpretability and scalability of Thompson sampling for bandits and maintains an approximate belief over the value of each state-specific intervention. The belief is updated as the algorithm interacts with the consumers. Despite being an approximation to the Bayes update, we prove the asymptotic optimality of our algorithm and analyze its convergence rate. We show that our algorithm significantly outperforms traditional approaches on extensive simulations calibrated to a real-world email marketing dataset.

Model-Free Approximate Bayesian Learning for Large-Scale Conversion Funnel Optimization

TL;DR

This work proposes a novel attribution-based decision-making algorithm that inherits the interpretability and scalability of Thompson sampling for bandits and maintains an approximate belief over the value of each state-specific intervention that significantly outperforms traditional approaches on extensive simulations calibrated to a real-world email marketing dataset.

Abstract

The flexibility of choosing the ad action as a function of the consumer state is critical for modern-day marketing campaigns. We study the problem of identifying the optimal sequential personalized interventions that maximize the adoption probability for a new product. We model consumer behavior by a conversion funnel that captures the state of each consumer (e.g., interaction history with the firm) and allows the consumer behavior to vary as a function of both her state and firm's sequential interventions. We show our model captures consumer behavior with very high accuracy (out-of-sample AUC of over 0.95) in a real-world email marketing dataset. However, it results in a very large-scale learning problem, where the firm must learn the state-specific effects of various interventions from consumer interactions. We propose a novel attribution-based decision-making algorithm for this problem that we call model-free approximate Bayesian learning. Our algorithm inherits the interpretability and scalability of Thompson sampling for bandits and maintains an approximate belief over the value of each state-specific intervention. The belief is updated as the algorithm interacts with the consumers. Despite being an approximation to the Bayes update, we prove the asymptotic optimality of our algorithm and analyze its convergence rate. We show that our algorithm significantly outperforms traditional approaches on extensive simulations calibrated to a real-world email marketing dataset.
Paper Structure (31 sections, 39 equations, 12 figures, 3 tables, 3 algorithms)

This paper contains 31 sections, 39 equations, 12 figures, 3 tables, 3 algorithms.

Figures (12)

  • Figure 1: Sequence of emails received by one of the authors after providing his email to Netflix (but not subscribing to the membership). The emails were sent with a 15-20 days gap in between (May 3, May 18, and June 8) and the contents of each of the email were unique. The subject line of the three emails were "Movies & TV shows your way", "Watch TV shows & movies anytime, anywhere", and "Netflix - something for everyone", respectively.
  • Figure 2: Distribution of number of emails received, opened, and clicked. For each distribution, we ignore the value of 0 to understand paths with some activity. Hence, the first bucket (x-axis) corresponds to a value of 1. Given our time horizon of $T=14$ days and a daily frequency of emails, the maximum value on the x-axis is 14.
  • Figure 3: Empirical proportion of paths that converted as a function of consumer's level of interaction. These exploratory plots simply summarize the observed data. For example, out of the paths that received exactly 3 emails, approximately $0.005$ fraction converted. Note that "received" tag only counts the number of emails received -- some of these might have been opened or clicked.
  • Figure 4: Illustration of Example \ref{['example:bandit']}. In subplot (a), we show the model for consumer behavior. If firm takes action $a \in \{1,2\}$, the consumer converts ($c$) w.p. $p_a$. Solid blue action ($a=1$) corresponds to a "call-to-action" type ad with a higher one-step conversion probability than the dashed red action ($a=2$), e.g., $p_1 = 0.3$ and $p_2 = 0$. In subplot (b), we illustrate the performance of TS. We display the evolution of the Beta belief over $[p_a]_{a=1}^2$ (denoted by $[Q_a]_a$) as a function of number of consumers (iteration). The initial count is set as $(\alpha_a, \beta_a) = (1,1)$ for $a \in \{1,2\}$, and hence, the initial belief (iteration 1) is uniform(0,1). As TS interacts with more consumers (see iterations 100 and 1000), action 1 (the optimal action) is chosen more often than action 2, and hence, the belief over action 1 concentrates around the true value of $0.3$. We see the belief over action 2 essentially remains the same after the 100th consumer, suggesting TS learned the optimal action by interacting with 100 consumers.
  • Figure 5: Benchmarking MFABL/pMFABL with TS, PSRL, and QL-UCB in terms of the performance ratio. In subplot (a), we show the mean and standard deviation of the PR over $R=100$ seeds with $N = 500,000$ consumers. In subplot (b), we show the evolution of PR as we increase $N$ from 1 to 500,000 (averaged over $R=100$ seeds).
  • ...and 7 more figures