Table of Contents
Fetching ...

Joint$λ$: Orchestrating Serverless Workflows on Jointcloud FaaS Systems

Jianfei Liu, Rui Li, Zhilin Yang, Peichang Shi, Guodong Yi, Huaimin Wang

TL;DR

Vendor lock-in and cross-cloud overhead motivate a distributed approach to serverless workflow orchestration. The paper introduces Jointλ, a function-side runtime with Backend-Shim that orchestrates workflows across multiple FaaS systems without a centralized controller, leveraging inter-cloud heterogeneity to reduce makespan and cost while providing fault tolerance. It achieves exactly-once execution through data production and invocation checkpoints, supports automatic failover, and enables cross-cloud data transfer and collaboration using per-function state and coordinated storage. Empirical evaluation on AWS Lambda and ALiYun FC across four workflows shows significant latency and cost improvements over single-cloud and cross-cloud baselines, and negligible failover overhead, with open-source release.

Abstract

Existing serverless workflow orchestration systems are predominantly designed for a single-cloud FaaS system, leading to vendor lock-in. This restricts performance optimization, cost reduction, and availability of applications. However, orchestrating serverless workflows on Jointcloud FaaS systems faces two main challenges: 1) Additional overhead caused by centralized cross-cloud orchestration; and 2) A lack of reliable failover and fault-tolerant mechanisms for cross-cloud serverless workflows. To address these challenges, we propose Joint$λ$, a distributed runtime system designed to orchestrate serverless workflows on multiple FaaS systems without relying on a centralized orchestrator. Joint$λ$ introduces a compatibility layer, Backend-Shim, leveraging inter-cloud heterogeneity to optimize makespan and reduce costs with on-demand billing. By using function-side orchestration instead of centralized nodes, it enables independent function invocations and data transfers, reducing cross-cloud communication overhead. For high availability, it ensures exactly-once execution via datastores and failover mechanisms for serverless workflows on Jointcloud FaaS systems. We validate Joint$λ$ on two heterogeneous FaaS systems, AWS and ALiYun, with four workflows. Compared to the most advanced commercial orchestration services for single-cloud serverless workflows, Joint$λ$ reduces up to 3.3$\times$ latency, saving up to 65\% cost. Joint$λ$ is also faster than the state-of-the-art orchestrators for cross-cloud serverless workflows up to 4.0$\times$, reducing up to 4.5$\times$ cost and providing strong execution guarantees.

Joint$λ$: Orchestrating Serverless Workflows on Jointcloud FaaS Systems

TL;DR

Vendor lock-in and cross-cloud overhead motivate a distributed approach to serverless workflow orchestration. The paper introduces Jointλ, a function-side runtime with Backend-Shim that orchestrates workflows across multiple FaaS systems without a centralized controller, leveraging inter-cloud heterogeneity to reduce makespan and cost while providing fault tolerance. It achieves exactly-once execution through data production and invocation checkpoints, supports automatic failover, and enables cross-cloud data transfer and collaboration using per-function state and coordinated storage. Empirical evaluation on AWS Lambda and ALiYun FC across four workflows shows significant latency and cost improvements over single-cloud and cross-cloud baselines, and negligible failover overhead, with open-source release.

Abstract

Existing serverless workflow orchestration systems are predominantly designed for a single-cloud FaaS system, leading to vendor lock-in. This restricts performance optimization, cost reduction, and availability of applications. However, orchestrating serverless workflows on Jointcloud FaaS systems faces two main challenges: 1) Additional overhead caused by centralized cross-cloud orchestration; and 2) A lack of reliable failover and fault-tolerant mechanisms for cross-cloud serverless workflows. To address these challenges, we propose Joint, a distributed runtime system designed to orchestrate serverless workflows on multiple FaaS systems without relying on a centralized orchestrator. Joint introduces a compatibility layer, Backend-Shim, leveraging inter-cloud heterogeneity to optimize makespan and reduce costs with on-demand billing. By using function-side orchestration instead of centralized nodes, it enables independent function invocations and data transfers, reducing cross-cloud communication overhead. For high availability, it ensures exactly-once execution via datastores and failover mechanisms for serverless workflows on Jointcloud FaaS systems. We validate Joint on two heterogeneous FaaS systems, AWS and ALiYun, with four workflows. Compared to the most advanced commercial orchestration services for single-cloud serverless workflows, Joint reduces up to 3.3 latency, saving up to 65\% cost. Joint is also faster than the state-of-the-art orchestrators for cross-cloud serverless workflows up to 4.0, reducing up to 4.5 cost and providing strong execution guarantees.

Paper Structure

This paper contains 25 sections, 20 figures, 3 tables.

Figures (20)

  • Figure 1: The P95 latency distribution with batch size = 2, 4 when running BERT on different functions. gpu4 means the configuration is 4G GPU, gpu8 means the configuration is 8G GPU. The type of GPU is A10. X means unable to run.
  • Figure 2: The average total cost with batch size = 2, 4 when running BERT on different functions.
  • Figure 3: Logically centralized orchestrator organizing serverless workflows on Jointcloud FaaS systems.
  • Figure 4: Abstraction for Joint$\lambda$. The data structure for the management of function invocation and data transfer in the function-side workflow orchestrator.
  • Figure 5: Sub-graph definitions and major primitives for workflow basic patterns. Invocation primitives and data transfer primitives (bold text in the figure) can be combined flexibly.
  • ...and 15 more figures