XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing
Torsten Hoefler, Marcin Copik, Pete Beckman, Andrew Jones, Ian Foster, Manish Parashar, Daniel Reed, Matthias Troyer, Thomas Schulthess, Dan Ernst, Jack Dongarra
TL;DR
Acceleration as a Service (XaaS) addresses the mismatch between HPC performance needs and cloud-centric productivity by proposing a unified, performance-portable container-based platform. The approach centers on three technical pillars—performance-portable infrastructure, high-performance communication and I/O, and low-overhead allocation and scheduling—and describes enabling technologies such as library hooks, RDMA-based data paths, and flexible invocation. The paper articulates opportunities across scheduling, accounting, hardware co-design, and security to converge HPC and cloud into productive high-performance accelerated cloud computing. The work envisions domain-specific containers and portable, scalable services that can run across providers, enabling broader access to AI/ML, climate simulations, and other resource-intensive workloads with improved utilization and efficiency.
Abstract
HPC and Cloud have evolved independently, specializing their innovations into performance or productivity. Acceleration as a Service (XaaS) is a recipe to empower both fields with a shared execution platform that provides transparent access to computing resources, regardless of the underlying cloud or HPC service provider. Bridging HPC and cloud advancements, XaaS presents a unified architecture built on performance-portable containers. Our converged model concentrates on low-overhead, high-performance communication and computing, targeting resource-intensive workloads from climate simulations to machine learning. XaaS lifts the restricted allocation model of Function-as-a-Service (FaaS), allowing users to benefit from the flexibility and efficient resource utilization of serverless while supporting long-running and performance-sensitive workloads from HPC.
