Table of Contents
Fetching ...

GeoFF: Federated Serverless Workflows with Data Pre-Fetching

Natalie Carl, Trever Schirmer, Tobias Pfandzelter, David Bermbach

TL;DR

GeoFF tackles the absence of cross-provider FaaS workflow support by introducing a decentralized choreography middleware that executes workflows across heterogeneous platforms and regions. It combines function pre-warming with data pre-fetching and enables ad-hoc workflow recomposition through per-invocation workflow specifications, all without a central orchestrator. The authors implement a prototype across tinyFaaS, AWS Lambda, and Google Cloud Functions and demonstrate latency reductions of more than 50 percent in a document-processing use case. This work advances federated, data-aware serverless computing by reducing cross-provider data movement, enabling flexible deployment, and improving fault tolerance.

Abstract

Function-as-a-Service (FaaS) is a popular cloud computing model in which applications are implemented as work flows of multiple independent functions. While cloud providers usually offer composition services for such workflows, they do not support cross-platform workflows forcing developers to hardcode the composition logic. Furthermore, FaaS workflows tend to be slow due to cascading cold starts, inter-function latency, and data download latency on the critical path. In this paper, we propose GeoFF, a serverless choreography middleware that executes FaaS workflows across different public and private FaaS platforms, including ad-hoc workflow recomposition. Furthermore, GeoFF supports function pre-warming and data pre-fetching. This minimizes end-to-end workflow latency by taking cold starts and data download latency off the critical path. In experiments with our proof-of-concept prototype and a realistic application, we were able to reduce end-to-end latency by more than 50%.

GeoFF: Federated Serverless Workflows with Data Pre-Fetching

TL;DR

GeoFF tackles the absence of cross-provider FaaS workflow support by introducing a decentralized choreography middleware that executes workflows across heterogeneous platforms and regions. It combines function pre-warming with data pre-fetching and enables ad-hoc workflow recomposition through per-invocation workflow specifications, all without a central orchestrator. The authors implement a prototype across tinyFaaS, AWS Lambda, and Google Cloud Functions and demonstrate latency reductions of more than 50 percent in a document-processing use case. This work advances federated, data-aware serverless computing by reducing cross-provider data movement, enabling flexible deployment, and improving fault tolerance.

Abstract

Function-as-a-Service (FaaS) is a popular cloud computing model in which applications are implemented as work flows of multiple independent functions. While cloud providers usually offer composition services for such workflows, they do not support cross-platform workflows forcing developers to hardcode the composition logic. Furthermore, FaaS workflows tend to be slow due to cascading cold starts, inter-function latency, and data download latency on the critical path. In this paper, we propose GeoFF, a serverless choreography middleware that executes FaaS workflows across different public and private FaaS platforms, including ad-hoc workflow recomposition. Furthermore, GeoFF supports function pre-warming and data pre-fetching. This minimizes end-to-end workflow latency by taking cold starts and data download latency off the critical path. In experiments with our proof-of-concept prototype and a realistic application, we were able to reduce end-to-end latency by more than 50%.
Paper Structure (18 sections, 8 figures)

This paper contains 18 sections, 8 figures.

Figures (8)

  • Figure 1: The GeoFF deployer takes code of functions (here $f_A$ to $f_E$), dependencies, and a deployment configuration to deploy an application across different public clouds, private clouds, and edge nodes. Each resulting FaaS function consists of the developer's function handler, a platform-specific wrapper, and the choreography middleware that handles workflow execution and data pre-fetching. Clients start workflows by invoking the first step with a function input and a workflow specification.
  • Figure 2: Workflow A is executed sequentially: The second step $\lambda_2$ is invoked only after completion of $\lambda_1$. For Workflow B, in contrast, $\lambda_2$ already experiences its cold start and downloads the required data while $\lambda_1$ is still executing its function logic. This reduces the total workflow duration.
  • Figure 3: The document preparation workflow is adapted from our previous work schirmer2023profaastinate: Users initiate the pre-check to ensure the PDF's correctness and store it in an object storage system. Subsequently, asynchronous processes are triggered for virus scanning, optical character recognition, and e-mail notifications.
  • Figure 4: This graph shows the cumulative distribution up to the 99th percentile of total workflow duration measurements for the document processing workflow of the data pre-fetching experiment. In the median, pre-fetching reduces the total workflow duration by 53.02% compared to the baseline case.
  • Figure 5: In the second use case (function shipping), only the OCR function pre-fetches. The function is deployed to the AWS regions eu-central-1 and us-east-1.
  • ...and 3 more figures