Table of Contents
Fetching ...

Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications

Andre Merzky, Mikhail Titov, Matteo Turilli, Ozgur Kilic, Tianle Wang, Shantenu Jha

TL;DR

The paper addresses the challenge of integrating AI/ML with high-performance computing to enable scalable AI-out-HPC hybrid workflows on leadership-class platforms. It introduces a service-based runtime built on top of RADICAL-Pilot, including ServiceManager and DataManager, to support distributed ML serving and heterogeneous task execution across local and remote resources, with Ollama used for prototyping LLM hosting. The authors present three LUCID pipelines (Cell Painting, Signature Detection, UQ) and provide a preliminary architecture blueprint, a prototype implementation, and an experimental evaluation of bootstrap, latency, and inference scalability. The results show negligible overhead for service-based execution compared to model inference times, supporting asynchronous and concurrent resource utilization, and they outline future work to adopt HPC-optimized serving technologies and adaptive scheduling. This work lays a foundation for scalable, interoperable AI-out-HPC workflows and motivates further integration of advanced model serving stacks on HPC platforms.

Abstract

Hybrid workflows combining traditional HPC and novel ML methodologies are transforming scientific computing. This paper presents the architecture and implementation of a scalable runtime system that extends RADICAL-Pilot with service-based execution to support AI-out-HPC workflows. Our runtime system enables distributed ML capabilities, efficient resource management, and seamless HPC/ML coupling across local and remote platforms. Preliminary experimental results show that our approach manages concurrent execution of ML models across local and remote HPC/cloud resources with minimal architectural overheads. This lays the foundation for prototyping three representative data-driven workflow applications and executing them at scale on leadership-class HPC platforms.

Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications

TL;DR

The paper addresses the challenge of integrating AI/ML with high-performance computing to enable scalable AI-out-HPC hybrid workflows on leadership-class platforms. It introduces a service-based runtime built on top of RADICAL-Pilot, including ServiceManager and DataManager, to support distributed ML serving and heterogeneous task execution across local and remote resources, with Ollama used for prototyping LLM hosting. The authors present three LUCID pipelines (Cell Painting, Signature Detection, UQ) and provide a preliminary architecture blueprint, a prototype implementation, and an experimental evaluation of bootstrap, latency, and inference scalability. The results show negligible overhead for service-based execution compared to model inference times, supporting asynchronous and concurrent resource utilization, and they outline future work to adopt HPC-optimized serving technologies and adaptive scheduling. This work lays a foundation for scalable, interoperable AI-out-HPC workflows and motivates further integration of advanced model serving stacks on HPC platforms.

Abstract

Hybrid workflows combining traditional HPC and novel ML methodologies are transforming scientific computing. This paper presents the architecture and implementation of a scalable runtime system that extends RADICAL-Pilot with service-based execution to support AI-out-HPC workflows. Our runtime system enables distributed ML capabilities, efficient resource management, and seamless HPC/ML coupling across local and remote platforms. Preliminary experimental results show that our approach manages concurrent execution of ML models across local and remote HPC/cloud resources with minimal architectural overheads. This lays the foundation for prototyping three representative data-driven workflow applications and executing them at scale on leadership-class HPC platforms.

Paper Structure

This paper contains 14 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: HPC/ML capabilities, technologies, and entities stack. Each layer contributes distinct capabilities to manage entities that enable scalable, concurrent execution of heterogeneous HPC/ML workflows. We display only a representative subset of the technology ecosystem available (HF = Hugging Face Transformers and DS = DeepSpeed).
  • Figure 2: Runtime architecture to support HPC/ML coupling. We extended RADICAL-Pilot with service-specific capabilities to enable large-scale deployment of ML capabilities on HPC. Numbers indicate the execution model of the service capabilities enabled by this architecture.
  • Figure 3: Service Bootstrap Times. Individual contributions to the overall bootstrap time for an increasing number of local service instances.
  • Figure 4: Service Response Times for local NOOP inference calls. Strong scaling (top, number of clients == 16) and weak scaling (bottom, number of services == number of clients).
  • Figure 5: Service Response Time for remote NOOP inference calls. Strong scaling (top, number of clients == 16) and weak scaling (bottom, number of services == number of clients).
  • ...and 1 more figures