Table of Contents
Fetching ...

Double Machine Learning at Scale to Predict Causal Impact of Customer Actions

Sushant More, Priya Kotwal, Sujith Chappidi, Dinesh Mandalapu, Chris Khawand

TL;DR

This paper operationalizes DML through a causal ML library based on Spark with a flexible, JSON-driven model configuration approach to estimate CI at scale (i.e., across hundred of actions and millions of customers) and outlines the DML methodology and implementation, and associated benefits over the traditional potential outcomes based CI model.

Abstract

Causal Impact (CI) of customer actions are broadly used across the industry to inform both short- and long-term investment decisions of various types. In this paper, we apply the double machine learning (DML) methodology to estimate the CI values across 100s of customer actions of business interest and 100s of millions of customers. We operationalize DML through a causal ML library based on Spark with a flexible, JSON-driven model configuration approach to estimate CI at scale (i.e., across hundred of actions and millions of customers). We outline the DML methodology and implementation, and associated benefits over the traditional potential outcomes based CI model. We show population-level as well as customer-level CI values along with confidence intervals. The validation metrics show a 2.2% gain over the baseline methods and a 2.5X gain in the computational time. Our contribution is to advance the scalable application of CI, while also providing an interface that allows faster experimentation, cross-platform support, ability to onboard new use cases, and improves accessibility of underlying code for partner teams.

Double Machine Learning at Scale to Predict Causal Impact of Customer Actions

TL;DR

This paper operationalizes DML through a causal ML library based on Spark with a flexible, JSON-driven model configuration approach to estimate CI at scale (i.e., across hundred of actions and millions of customers) and outlines the DML methodology and implementation, and associated benefits over the traditional potential outcomes based CI model.

Abstract

Causal Impact (CI) of customer actions are broadly used across the industry to inform both short- and long-term investment decisions of various types. In this paper, we apply the double machine learning (DML) methodology to estimate the CI values across 100s of customer actions of business interest and 100s of millions of customers. We operationalize DML through a causal ML library based on Spark with a flexible, JSON-driven model configuration approach to estimate CI at scale (i.e., across hundred of actions and millions of customers). We outline the DML methodology and implementation, and associated benefits over the traditional potential outcomes based CI model. We show population-level as well as customer-level CI values along with confidence intervals. The validation metrics show a 2.2% gain over the baseline methods and a 2.5X gain in the computational time. Our contribution is to advance the scalable application of CI, while also providing an interface that allows faster experimentation, cross-platform support, ability to onboard new use cases, and improves accessibility of underlying code for partner teams.
Paper Structure (20 sections, 10 equations, 11 figures, 1 table)

This paper contains 20 sections, 10 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Representiative propensity scores distribution for control (top panel) and treatment (bottom panel) groups.
  • Figure 2: Effect of propensity scores trimming and rescaling on estimated CI for a certain customer action.
  • Figure 3: Schematic for calculation of distance from cluster centroids. The red dot is represented by three features which is the distance from centroids from blue, green, and black clusters.
  • Figure 4: Schematic of the CI-DML modeling framework.
  • Figure 5: JSON-Machine Learning Stage Interpreter modeling stages
  • ...and 6 more figures