Table of Contents
Fetching ...

WORKSWORLD: A Domain for Integrated Numeric Planning and Scheduling of Distributed Pipelined Workflows

Taylor Paul, William Regli

Abstract

This work pursues automated planning and scheduling of distributed data pipelines, or workflows. We develop a general workflow and resource graph representation that includes both data processing and sharing components with corresponding network interfaces for scheduling. Leveraging these graphs, we introduce WORKSWORLD, a new domain for numeric domain-independent planners designed for permanently scheduled workflows, like ingest pipelines. Our framework permits users to define data sources, available workflow components, and desired data destinations and formats without explicitly declaring the entire workflow graph as a goal. The planner solves a joint planning and scheduling problem, producing a plan that both builds the workflow graph and schedules its components on the resource graph. We empirically show that a state-of-the-art numeric planner running on commodity hardware with one hour of CPU time and 30GB of memory can solve linear-chain workflows of up to 14 components across eight sites.

WORKSWORLD: A Domain for Integrated Numeric Planning and Scheduling of Distributed Pipelined Workflows

Abstract

This work pursues automated planning and scheduling of distributed data pipelines, or workflows. We develop a general workflow and resource graph representation that includes both data processing and sharing components with corresponding network interfaces for scheduling. Leveraging these graphs, we introduce WORKSWORLD, a new domain for numeric domain-independent planners designed for permanently scheduled workflows, like ingest pipelines. Our framework permits users to define data sources, available workflow components, and desired data destinations and formats without explicitly declaring the entire workflow graph as a goal. The planner solves a joint planning and scheduling problem, producing a plan that both builds the workflow graph and schedules its components on the resource graph. We empirically show that a state-of-the-art numeric planner running on commodity hardware with one hour of CPU time and 30GB of memory can solve linear-chain workflows of up to 14 components across eight sites.
Paper Structure (24 sections, 4 equations, 6 figures, 4 tables)

This paper contains 24 sections, 4 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The Worksworld framework. Primary research contributions are green; secondary engineering artifacts provided without experimental evaluation are yellow.
  • Figure 2: Our scheduling problem consists of mapping movable components in the left-hand to utilize compute, storage and network resources across sites.
  • Figure 3: Sites and the relevant annotations and resources for our scheduling problem.
  • Figure 4: PDDL type hierarchy for Worksworld and abbreviations utilized throughout paper.
  • Figure 5: Planning and scheduling linear-chain workflows varying one aspect: workflow components, interfaces, direct links and sites.
  • ...and 1 more figures

Theorems & Definitions (10)

  • Definition 1: Pipelined Workflow Scheduling
  • Definition 2: Numeric Action
  • Definition 3: Numeric Planning Problem
  • Definition 4: Plan
  • Definition 5: Site
  • Definition 6: Interface
  • Definition 7: Workflow Component
  • Definition 8: Workflow
  • Definition 9: Total Cost
  • Definition 10: Absolute Latency