Table of Contents
Fetching ...

Some Statistical and Data Challenges When Building Early-Stage Digital Experimentation and Measurement Capabilities

C. H. Bryan Liu

TL;DR

This work develops a rigorous framework for building and justifying early-stage digital experimentation and measurement (DEM) capabilities. It introduces a ranking under lower uncertainty model to quantify the value and risk of DEM-driven noise reduction, providing closed-form expressions for the expected gain ${\mathbb{E}}(\mathcal{D})$ and its variance, thereby enabling Sharpe-ratio based business cases. It then offers a comprehensive treatment of statistical testing in digital experiments (NHST, Bayesian, sequential, and non-parametric methods), followed by a taxonomy of digital experiment datasets and an evaluation framework for personalization experiment designs. The thesis is substantiated with empirical verifications, public datasets (including the ASOS Digital Experiments Dataset), and case studies in e-commerce and marketing, culminating in a concrete path to apply these methodologies in industry settings and future research directions. Overall, the work advances both theory and practice for evaluating, designing, and deploying DEM capabilities with emphasis on data efficiency, causality, and scalable decision-making under uncertainty $${\Delta}$$, ${\mathcal{V}}$, ${\mathcal{E}}$, ${\mathcal{W}}$, and ${\mathcal{D}}$ across multiple designs and datasets.

Abstract

Digital experimentation and measurement (DEM) capabilities -- the knowledge and tools necessary to run experiments with digital products, services, or experiences and measure their impact -- are fast becoming part of the standard toolkit of digital/data-driven organisations in guiding business decisions. Many large technology companies report having mature DEM capabilities, and several businesses have been established purely to manage experiments for others. Given the growing evidence that data-driven organisations tend to outperform their non-data-driven counterparts, there has never been a greater need for organisations to build/acquire DEM capabilities to thrive in the current digital era. This thesis presents several novel approaches to statistical and data challenges for organisations building DEM capabilities. We focus on the fundamentals associated with building DEM capabilities, which lead to a richer understanding of the underlying assumptions and thus enable us to develop more appropriate capabilities. We address why one should engage in DEM by quantifying the benefits and risks of acquiring DEM capabilities. This is done using a ranking under lower uncertainty model, enabling one to construct a business case. We also examine what ingredients are necessary to run digital experiments. In addition to clarifying the existing literature around statistical tests, datasets, and methods in experimental design and causal inference, we construct an additional dataset and detailed case studies on applying state-of-the-art methods. Finally, we investigate when a digital experiment design would outperform another, leading to an evaluation framework that compares competing designs' data efficiency.

Some Statistical and Data Challenges When Building Early-Stage Digital Experimentation and Measurement Capabilities

TL;DR

This work develops a rigorous framework for building and justifying early-stage digital experimentation and measurement (DEM) capabilities. It introduces a ranking under lower uncertainty model to quantify the value and risk of DEM-driven noise reduction, providing closed-form expressions for the expected gain and its variance, thereby enabling Sharpe-ratio based business cases. It then offers a comprehensive treatment of statistical testing in digital experiments (NHST, Bayesian, sequential, and non-parametric methods), followed by a taxonomy of digital experiment datasets and an evaluation framework for personalization experiment designs. The thesis is substantiated with empirical verifications, public datasets (including the ASOS Digital Experiments Dataset), and case studies in e-commerce and marketing, culminating in a concrete path to apply these methodologies in industry settings and future research directions. Overall, the work advances both theory and practice for evaluating, designing, and deploying DEM capabilities with emphasis on data efficiency, causality, and scalable decision-making under uncertainty , , , , and across multiple designs and datasets.

Abstract

Digital experimentation and measurement (DEM) capabilities -- the knowledge and tools necessary to run experiments with digital products, services, or experiences and measure their impact -- are fast becoming part of the standard toolkit of digital/data-driven organisations in guiding business decisions. Many large technology companies report having mature DEM capabilities, and several businesses have been established purely to manage experiments for others. Given the growing evidence that data-driven organisations tend to outperform their non-data-driven counterparts, there has never been a greater need for organisations to build/acquire DEM capabilities to thrive in the current digital era. This thesis presents several novel approaches to statistical and data challenges for organisations building DEM capabilities. We focus on the fundamentals associated with building DEM capabilities, which lead to a richer understanding of the underlying assumptions and thus enable us to develop more appropriate capabilities. We address why one should engage in DEM by quantifying the benefits and risks of acquiring DEM capabilities. This is done using a ranking under lower uncertainty model, enabling one to construct a business case. We also examine what ingredients are necessary to run digital experiments. In addition to clarifying the existing literature around statistical tests, datasets, and methods in experimental design and causal inference, we construct an additional dataset and detailed case studies on applying state-of-the-art methods. Finally, we investigate when a digital experiment design would outperform another, leading to an evaluation framework that compares competing designs' data efficiency.
Paper Structure (196 sections, 230 equations, 25 figures, 10 tables)

This paper contains 196 sections, 230 equations, 25 figures, 10 tables.

Figures (25)

  • Figure 1: Illustration of an A/B test (a randomised controlled trial with two parallel groups) designed to test the claim that "offering free delivery to users of an e-commerce website will lead to a larger proportion of the said users making a purchase." Incoming users are split randomly into two groups, where one group (A) acts as the control, and the other group (B) is shown a "free delivery" banner on the website as the treatment.
  • Figure 2: Prioritising four projects (the fruits) according to their value (x-axes). The semi-opaque icons represent the projects' true value, and the solid icons represent possible project value estimates under some level of uncertainty (horizontal lines) in the estimation process. (Top) Under a noisy estimation process, projects with a low true value (e.g., project apple) may appear to have a high value and be prioritised erroneously. (Bottom) DEM reduces the estimation noise, enabling a better prioritisation with value estimates closer to the truth. The figures include icons designed by Smashicons from Flaticon.com.
  • Figure 3: The generative model in the ranking under lower uncertainty problem in plate notation. $\mathcal{V}_i$ represents the true, unobserved values of the items to be ranked. $\mathcal{E}_i$ represents the observed values under some estimation noise level $\sigma^2_\epsilon$.
  • Figure 4: The generative model in the ranking under lower uncertainty problem when two distinct noise levels are involved in plate notation. When we change the noise level of our ranking under uncertainty setup from $\sigma^2_1$ to $\sigma^2_2$ (see Figure \ref{['fig:vem_rulu_generative_model_one_sample']}), we obtain two sets of observed values, $\mathcal{H}_i$ and $\mathcal{L}_i$, for each noise level.
  • Figure 5: Relationship between different variances/covariances used to calculate the variance of $\mathcal{D}$, the value gained when the estimation noise is reduced. An arrow from quantity A to B means the value of B depends on the value of A.
  • ...and 20 more figures