Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications
Andre Merzky, Mikhail Titov, Matteo Turilli, Ozgur Kilic, Tianle Wang, Shantenu Jha
TL;DR
The paper addresses the challenge of integrating AI/ML with high-performance computing to enable scalable AI-out-HPC hybrid workflows on leadership-class platforms. It introduces a service-based runtime built on top of RADICAL-Pilot, including ServiceManager and DataManager, to support distributed ML serving and heterogeneous task execution across local and remote resources, with Ollama used for prototyping LLM hosting. The authors present three LUCID pipelines (Cell Painting, Signature Detection, UQ) and provide a preliminary architecture blueprint, a prototype implementation, and an experimental evaluation of bootstrap, latency, and inference scalability. The results show negligible overhead for service-based execution compared to model inference times, supporting asynchronous and concurrent resource utilization, and they outline future work to adopt HPC-optimized serving technologies and adaptive scheduling. This work lays a foundation for scalable, interoperable AI-out-HPC workflows and motivates further integration of advanced model serving stacks on HPC platforms.
Abstract
Hybrid workflows combining traditional HPC and novel ML methodologies are transforming scientific computing. This paper presents the architecture and implementation of a scalable runtime system that extends RADICAL-Pilot with service-based execution to support AI-out-HPC workflows. Our runtime system enables distributed ML capabilities, efficient resource management, and seamless HPC/ML coupling across local and remote platforms. Preliminary experimental results show that our approach manages concurrent execution of ML models across local and remote HPC/cloud resources with minimal architectural overheads. This lays the foundation for prototyping three representative data-driven workflow applications and executing them at scale on leadership-class HPC platforms.
