Combining Serverless and High-Performance Computing Paradigms to support ML Data-Intensive Applications
Mills Staylor, Arup Kumar Sarker, Gregor von Laszewski, Geoffrey Fox, Yue Cheng, Judy Fox
TL;DR
The paper tackles the bottleneck of data transfer and communication in serverless data-intensive ML workflows by integrating Cylon with a FMI-inspired NAT hole punching communicator and UCX/UCC BSP infrastructure to fuse serverless and serverful HPC. It presents a high-performance, extensible architecture built on Apache Arrow and Parquet, with serverless and serverful execution paths, and demonstrates near EC2-level weak scaling while revealing Lambda's limitations in strong scaling. Key contributions include the serverless communicator design, UCS/UCC-based BSP integration, containerized cross-platform support, and empirical results across AWS Lambda, EC2, and an HPC cluster. The work points to practical implications for genomics, hydrology, astronomy, and related domains where large-scale ML data processing in the cloud can be made more cost-effective and scalable.
Abstract
Data is found everywhere, from health and human infrastructure to the surge of sensors and the proliferation of internet-connected devices. To meet this challenge, the data engineering field has expanded significantly in recent years in both research and industry. Traditionally, data engineering, Machine Learning, and AI workloads have been run on large clusters within data center environments, requiring substantial investment in hardware and maintenance. With the rise of the public cloud, it is now possible to run large applications across nodes without owning or maintaining hardware. Serverless functions such as AWS Lambda provide horizontal scaling and precise billing without the hassle of managing traditional cloud infrastructure. However, when processing large datasets, users often rely on external storage options that are significantly slower than direct communication typical of HPC clusters. We introduce Cylon, a high-performance distributed data frame solution that has shown promising results for data processing using Python. We describe how we took inspiration from the FMI library and designed a serverless communicator to tackle communication and performance issues associated with serverless functions. With our design, we demonstrate that the performance of AWS Lambda falls below one percent of strong scaling experiments compared to serverful AWS (EC2) and HPCs based on implementing direct communication via NAT Traversal TCP Hole Punching.
