Table of Contents
Fetching ...

Wilkins: HPC In Situ Workflows Made Easy

Orcun Yildiz, Dmitriy Morozov, Arnur Nigmetov, Bogdan Nicolae, Tom Peterka

TL;DR

Wilkins presents an in situ HPC workflow system designed for ease of use and scalability. It combines a data-centric YAML workflow description with a HighFive-based, HDF5-backed data transport to couple heterogeneous tasks without modifying user codes, and adds a flow-control mechanism to accommodate fluctuating data rates. The approach supports ensembles, various topologies, and custom actions via external Python callbacks, demonstrated by synthetic benchmarks and use cases in materials science and cosmology. Results show negligible overhead relative to standalone data transport, significant speedups from flow-control strategies, and scalable ensemble execution, highlighting Wilkins' practical impact for complex, data-intensive in situ workflows.

Abstract

In situ approaches can accelerate the pace of scientific discoveries by allowing scientists to perform data analysis at simulation time. Current in situ workflow systems, however, face challenges in handling the growing complexity and diverse computational requirements of scientific tasks. In this work, we present Wilkins, an in situ workflow system that is designed for ease-of-use while providing scalable and efficient execution of workflow tasks. Wilkins provides a flexible workflow description interface, employs a high-performance data transport layer based on HDF5, and supports tasks with disparate data rates by providing a flow control mechanism. Wilkins seamlessly couples scientific tasks that already use HDF5, without requiring task code modifications. We demonstrate the above features using both synthetic benchmarks and two science use cases in materials science and cosmology.

Wilkins: HPC In Situ Workflows Made Easy

TL;DR

Wilkins presents an in situ HPC workflow system designed for ease of use and scalability. It combines a data-centric YAML workflow description with a HighFive-based, HDF5-backed data transport to couple heterogeneous tasks without modifying user codes, and adds a flow-control mechanism to accommodate fluctuating data rates. The approach supports ensembles, various topologies, and custom actions via external Python callbacks, demonstrated by synthetic benchmarks and use cases in materials science and cosmology. Results show negligible overhead relative to standalone data transport, significant speedups from flow-control strategies, and scalable ensemble execution, highlighting Wilkins' practical impact for complex, data-intensive in situ workflows.

Abstract

In situ approaches can accelerate the pace of scientific discoveries by allowing scientists to perform data analysis at simulation time. Current in situ workflow systems, however, face challenges in handling the growing complexity and diverse computational requirements of scientific tasks. In this work, we present Wilkins, an in situ workflow system that is designed for ease-of-use while providing scalable and efficient execution of workflow tasks. Wilkins provides a flexible workflow description interface, employs a high-performance data transport layer based on HDF5, and supports tasks with disparate data rates by providing a flow control mechanism. Wilkins seamlessly couples scientific tasks that already use HDF5, without requiring task code modifications. We demonstrate the above features using both synthetic benchmarks and two science use cases in materials science and cosmology.
Paper Structure (27 sections, 10 figures, 3 tables)

This paper contains 27 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Overview of the Wilkins system.
  • Figure 2: Example of three tasks coupled through Wilkins.
  • Figure 3: Example of ensemble coupling performed by Wilkins in a fan-in topology with 4 producer and 2 consumer instances.
  • Figure 4: Time to write/read grid and particles between 1 producer and 1 consumer task, comparing using LowFive alone with Wilkins.
  • Figure 5: Gantt charts for the execution of producer and 5x slow consumer for 10 iterations under different flow control strategies.
  • ...and 5 more figures