Table of Contents
Fetching ...

Optimization Opportunities for Cloud-Based Data Pipeline Infrastructures

Johannes Jablonski, Georg-Daniel Schwarz, Philip Heltweg, Dirk Riehle

Abstract

Cloud infrastructure supports the efficient operation of data pipelines regarding requirements like cost, speed, and resource utilization. We present an integrated view of optimization opportunities for cloud-based data pipelines by conducting a systematic review of existing literature on optimization approaches to cloud infrastructure performance for data pipelines. Our study contributes a theory of optimization goals like minimizing cost, reducing execution time, and cost-makespan trade-offs, consisting of dimensions such as single vs. multi-cloud, batch vs. stream processing, etc. We highlight gaps in primary research, including the underexploration of multi-tenant environments and lack of industry evaluation, and suggest directions for future research.

Optimization Opportunities for Cloud-Based Data Pipeline Infrastructures

Abstract

Cloud infrastructure supports the efficient operation of data pipelines regarding requirements like cost, speed, and resource utilization. We present an integrated view of optimization opportunities for cloud-based data pipelines by conducting a systematic review of existing literature on optimization approaches to cloud infrastructure performance for data pipelines. Our study contributes a theory of optimization goals like minimizing cost, reducing execution time, and cost-makespan trade-offs, consisting of dimensions such as single vs. multi-cloud, batch vs. stream processing, etc. We highlight gaps in primary research, including the underexploration of multi-tenant environments and lack of industry evaluation, and suggest directions for future research.

Paper Structure

This paper contains 62 sections, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Overview of the complete research process, from initial study design to final presentation.
  • Figure 2: Literature selection process, from the original sources to filtering according to duplicates, title, abstract, and full text by multiple co-authors. The final result is a set of relevant articles.
  • Figure 3: Year distribution of the papers included in the analysis.
  • Figure 4: Overview of the concepts discovered through our study.
  • Figure 5: Distribution of different infrastructures types
  • ...and 3 more figures