Table of Contents
Fetching ...

Flock: A Low-Cost Streaming Query Engine on FaaS Platforms

Gang Liao, Amol Deshpande, Daniel J. Abadi

TL;DR

Flock targets the latency and cost overhead of existing serverless streaming analytics that rely on external storage for inter-function communication. It achieves real-time analytics on FaaS by passing data through function invocation payloads, eliminating the need for a centralized coordinator and external shuffles; it also introduces a template-based function generation approach and supports both SQL and DataFrame APIs. The system leverages a generic function template, a DAG-based dataflow, and multi-level shuffling, with ARM Graviton2 hardware offering notable price-perf benefits. Evaluations on NEXMark and Yahoo Streaming Benchmark demonstrate substantial cost reductions (often over an order of magnitude versus Flink) while maintaining competitive latency and throughput, highlighting the practicality of payload-based streaming on FaaS for real-time analytics.

Abstract

Existing serverless data analytics systems rely on external storage services like S3 for data shuffling and communication between cloud functions. While this approach provides the elasticity benefits of serverless computing, it incurs additional latency and cost overheads. We present Flock, a novel cloud-native streaming query engine that leverages the on-demand scalability of FaaS platforms for real-time data analytics. Flock utilizes function invocation payloads for efficient data exchange, eliminating the need for external storage. This not only reduces latency and cost but also simplifies the architecture by removing the requirement for a centralized coordinator. Flock employs a template-based approach to dynamically create cloud functions for each query stage and a function group mechanism for handling data aggregation and shuffling. It supports both SQL and DataFrame APIs, making it easy to use. Our evaluation shows that Flock provides significant performance gains and cost savings compared to existing serverless and serverful streaming systems. It outperforms Apache Flink by 10-20x in cost while achieving similar latency and throughput.

Flock: A Low-Cost Streaming Query Engine on FaaS Platforms

TL;DR

Flock targets the latency and cost overhead of existing serverless streaming analytics that rely on external storage for inter-function communication. It achieves real-time analytics on FaaS by passing data through function invocation payloads, eliminating the need for a centralized coordinator and external shuffles; it also introduces a template-based function generation approach and supports both SQL and DataFrame APIs. The system leverages a generic function template, a DAG-based dataflow, and multi-level shuffling, with ARM Graviton2 hardware offering notable price-perf benefits. Evaluations on NEXMark and Yahoo Streaming Benchmark demonstrate substantial cost reductions (often over an order of magnitude versus Flink) while maintaining competitive latency and throughput, highlighting the practicality of payload-based streaming on FaaS for real-time analytics.

Abstract

Existing serverless data analytics systems rely on external storage services like S3 for data shuffling and communication between cloud functions. While this approach provides the elasticity benefits of serverless computing, it incurs additional latency and cost overheads. We present Flock, a novel cloud-native streaming query engine that leverages the on-demand scalability of FaaS platforms for real-time data analytics. Flock utilizes function invocation payloads for efficient data exchange, eliminating the need for external storage. This not only reduces latency and cost but also simplifies the architecture by removing the requirement for a centralized coordinator. Flock employs a template-based approach to dynamically create cloud functions for each query stage and a function group mechanism for handling data aggregation and shuffling. It supports both SQL and DataFrame APIs, making it easy to use. Our evaluation shows that Flock provides significant performance gains and cost savings compared to existing serverless and serverful streaming systems. It outperforms Apache Flink by 10-20x in cost while achieving similar latency and throughput.
Paper Structure (29 sections, 2 equations, 9 figures, 3 tables)

This paper contains 29 sections, 2 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: System Architecture.
  • Figure 2: Generic Function and Template Specialization.
  • Figure 3: The duration charge comparison.
  • Figure 4: Cloud Function Group.
  • Figure 5: Multi-level Shuffling.
  • ...and 4 more figures