Joint$λ$: Orchestrating Serverless Workflows on Jointcloud FaaS Systems
Jianfei Liu, Rui Li, Zhilin Yang, Peichang Shi, Guodong Yi, Huaimin Wang
TL;DR
Vendor lock-in and cross-cloud overhead motivate a distributed approach to serverless workflow orchestration. The paper introduces Jointλ, a function-side runtime with Backend-Shim that orchestrates workflows across multiple FaaS systems without a centralized controller, leveraging inter-cloud heterogeneity to reduce makespan and cost while providing fault tolerance. It achieves exactly-once execution through data production and invocation checkpoints, supports automatic failover, and enables cross-cloud data transfer and collaboration using per-function state and coordinated storage. Empirical evaluation on AWS Lambda and ALiYun FC across four workflows shows significant latency and cost improvements over single-cloud and cross-cloud baselines, and negligible failover overhead, with open-source release.
Abstract
Existing serverless workflow orchestration systems are predominantly designed for a single-cloud FaaS system, leading to vendor lock-in. This restricts performance optimization, cost reduction, and availability of applications. However, orchestrating serverless workflows on Jointcloud FaaS systems faces two main challenges: 1) Additional overhead caused by centralized cross-cloud orchestration; and 2) A lack of reliable failover and fault-tolerant mechanisms for cross-cloud serverless workflows. To address these challenges, we propose Joint$λ$, a distributed runtime system designed to orchestrate serverless workflows on multiple FaaS systems without relying on a centralized orchestrator. Joint$λ$ introduces a compatibility layer, Backend-Shim, leveraging inter-cloud heterogeneity to optimize makespan and reduce costs with on-demand billing. By using function-side orchestration instead of centralized nodes, it enables independent function invocations and data transfers, reducing cross-cloud communication overhead. For high availability, it ensures exactly-once execution via datastores and failover mechanisms for serverless workflows on Jointcloud FaaS systems. We validate Joint$λ$ on two heterogeneous FaaS systems, AWS and ALiYun, with four workflows. Compared to the most advanced commercial orchestration services for single-cloud serverless workflows, Joint$λ$ reduces up to 3.3$\times$ latency, saving up to 65\% cost. Joint$λ$ is also faster than the state-of-the-art orchestrators for cross-cloud serverless workflows up to 4.0$\times$, reducing up to 4.5$\times$ cost and providing strong execution guarantees.
