Dirigent: Lightweight Serverless Orchestration

Lazar Cvetković; François Costa; Mihajlo Djokic; Michal Friedman; Ana Klimovic

Dirigent: Lightweight Serverless Orchestration

Lazar Cvetković, François Costa, Mihajlo Djokic, Michal Friedman, Ana Klimovic

TL;DR

Dirigent, a clean-slate system architecture for FaaS orchestration with three key principles, is proposed, which optimizes internal cluster manager abstractions to simplify state management and runs monolithic control and data planes to minimize internal communication overheads and maximize throughput.

Abstract

While Function as a Service (FaaS) platforms can initialize function sandboxes on worker nodes in 10-100s of milliseconds, the latency to schedule functions in real FaaS clusters can be orders of magnitude higher. The current approach of building FaaS cluster managers on top of legacy orchestration systems (e.g., Kubernetes) leads to high scheduling delays when clusters experience high sandbox churn, which is common for FaaS. Generic cluster managers use many hierarchical abstractions and internal components to manage and reconcile cluster state with frequent persistent updates. This becomes a bottleneck for FaaS since the cluster state frequently changes as sandboxes are created on the critical path of requests. Based on our root cause analysis of performance issues in existing FaaS cluster managers, we propose Dirigent, a clean-slate system architecture for FaaS orchestration with three key principles. First, Dirigent optimizes internal cluster manager abstractions to simplify state management. Second, it eliminates persistent state updates on the critical path of function invocations, leveraging the fact that FaaS abstracts sandbox locations from users to relax exact state reconstruction guarantees. Finally, Dirigent runs monolithic control and data planes to minimize internal communication overheads and maximize throughput. We compare Dirigent to state-of-the-art FaaS platforms and show that Dirigent reduces 99th percentile per-function scheduling latency for a production workload by 2.79x compared to AWS Lambda. Dirigent can spin up 2500 sandboxes per second at low latency, which is 1250x more than Knative.

Dirigent: Lightweight Serverless Orchestration

TL;DR

Abstract

Paper Structure (24 sections, 11 figures, 3 tables)

This paper contains 24 sections, 11 figures, 3 tables.

Introduction
Background and Motivation
FaaS Cluster Management Requirements
The Kubernetes -- FaaS Mismatch
Related Work
Dirigent Design Approach
System Overview
Design Principles
Life of a Request
Fault Tolerance
Component-level fault tolerance
Request-level fault tolerance
Implementation and Limitations
Evaluation
Experimental Methodology
...and 9 more sections

Figures (11)

Figure 1: End-to-end latency breakdown of cold invocation bursts in Knative. Sandbox creation involves sequentially creating two containers: user-code container and its sidecar. Sandbox init is the time it takes to pass health probes.
Figure 2: AWS Lambda end-to-end latency CDFs with different cold start bursts of hello-world functions. We pre-cache container images, based on insights from Brooker et al. brooker:firecracker_snapshots.
Figure 3: Rate of sandbox creation over time in a 30-minute window (after 10-min warmup) of the 70K function Azure trace shahrad:serverless, simulated on a 1000 worker-node cluster with default Knative scheduling policies. Each sandbox processes 1 request at a time, the default for FaaS platforms aws:sandbox_concurrencygcf:invocation_level_guarantees.
Figure 4: Knative system architecture, which builds on K8s. This diagram is simplified, showing only key components which all run as independent microservices. K8s components are blue, while yellow components are added by Knative.
Figure 5: CDF of per-invocation scheduling latency and per-function mean scheduling latency when executing 500-function Azure trace ustiugov:in_vitroshahrad:serverless on a 93-worker cluster.
...and 6 more figures

Dirigent: Lightweight Serverless Orchestration

TL;DR

Abstract

Dirigent: Lightweight Serverless Orchestration

Authors

TL;DR

Abstract

Table of Contents

Figures (11)