Accelerating Causal Algorithms for Industrial-scale Data: A Distributed Computing Approach with Ray Framework

Vishal Verma; Vinod Reddy; Jaiprakash Ravi

Accelerating Causal Algorithms for Industrial-scale Data: A Distributed Computing Approach with Ray Framework

Vishal Verma, Vinod Reddy, Jaiprakash Ravi

TL;DR

The paper addresses the challenge of scalable causal analysis on industrial-scale data by integrating the Ray distributed framework with OCML-based causal inference, exemplified through the Nexus platform and a Dream11 case study. It presents a distributed OCML workflow with cross-fitting and hyperparameter tuning, enabling substantial runtime reductions and enabling analyses on datasets with hundreds of covariates and millions of units. Key contributions include the Nexus architecture, distributed cross-fitting (5.1), distributed hyperparameter tuning (5.2), and empirical scalability results (5.3) demonstrating improved performance over single-node implementations. The work has practical implications for deploying causal analysis at scale in industry, offering a path toward faster, cost-efficient causal decision-making and paving the way for scaling additional causal algorithms and discovery methods in the future.

Abstract

The increasing need for causal analysis in large-scale industrial datasets necessitates the development of efficient and scalable causal algorithms for real-world applications. This paper addresses the challenge of scaling causal algorithms in the context of conducting causal analysis on extensive datasets commonly encountered in industrial settings. Our proposed solution involves enhancing the scalability of causal algorithm libraries, such as EconML, by leveraging the parallelism capabilities offered by the distributed computing framework Ray. We explore the potential of parallelizing key iterative steps within causal algorithms to significantly reduce overall runtime, supported by a case study that examines the impact on estimation times and costs. Through this approach, we aim to provide a more effective solution for implementing causal analysis in large-scale industrial applications.

Accelerating Causal Algorithms for Industrial-scale Data: A Distributed Computing Approach with Ray Framework

TL;DR

Abstract

Paper Structure (14 sections, 8 equations, 6 figures, 1 table)

This paper contains 14 sections, 8 equations, 6 figures, 1 table.

Introduction
PRELIMINARIES
Observational Causal Inference (OCI) : Setup and Assumptions
Selection on Observables
Orthogonal/ Debiased Machine learning
Distributed computing using Ray
Related Work
APPLICATIONS AT Dream11
Case Study: Accelerating OCML
Distributed Crossfitting
Distributed Tuning
Running Time and Scalability
Conclusion and Future Scope
Acknowledgements

Figures (6)

Figure 1: U are unobserved entities. Assumption 4 means that there is no causal link between U and the observed data.
Figure 2: End To End OCI workflow at Dream11
Figure 3: Sequential Cross Validation
Figure 4: Parallel Cross Validation using Ray Tasks
Figure 5: Distributed HyperParam Optimization using Ray Tune (Img source: https://speakerdeck.com/anyscale/fast-and-efficient-hyperparameter-tuning-with-ray-tune?slide=51)
...and 1 more figures

Theorems & Definitions (1)

proof

Accelerating Causal Algorithms for Industrial-scale Data: A Distributed Computing Approach with Ray Framework

TL;DR

Abstract

Accelerating Causal Algorithms for Industrial-scale Data: A Distributed Computing Approach with Ray Framework

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (1)