Table of Contents
Fetching ...

Odyssey: An End-to-End System for Pareto-Optimal Serverless Query Processing

Shyam Jesalpura, Shengda Zhu, Amir Shaikhha, Antonio Barbalace, Boris Grot

TL;DR

Odyssey tackles the challenge of planning serverless data analytics by introducing an end-to-end pipeline that integrates a serverless-aware planner, a query-agnostic cost model, and a hybrid execution engine. It solves the enormous configuration space via stage-confined dependencies and Incremental Pareto Boundary Search to produce near-constant frontier sizes, while accounting for cold starts and storage throttling to maintain predictive accuracy. Empirical results on TPC-H up to 10TB show that Odyssey's knee-point configurations consistently outperform AWS Athena in either latency or cost, with planning overhead under 5% of total execution time. The work offers a practical, scalable path toward automated, serverless-native analytics and plans to open-source the system to enable reproducibility and adoption.

Abstract

Running data analytics queries on serverless (FaaS) workers has been shown to be cost- and performance-efficient for a variety of real-world scenarios, including intermittent query arrival patterns, sudden load spikes and management challenges that afflict managed VM clusters. Alas, existing serverless data analytics works focus primarily on the serverless execution engine and assume the existence of a "good" query execution plan or rely on user guidance to construct such a plan. Meanwhile, even simple analytics queries on serverless have a huge space of possible plans, with vast differences in both performance and cost among plans. This paper introduces Odyssey, an end-to-end serverless-native data analytics pipeline that integrates a query planner, cost model and execution engine. Odyssey automatically generates and evaluates serverless query plans, utilizing state space pruning heuristics and a novel search algorithm to identify Pareto-optimal plans that balance cost and performance with low latency even for complex queries. Our evaluations demonstrate that Odyssey accurately predicts both monetary cost and latency, and consistently outperforms AWS Athena on cost and/or latency.

Odyssey: An End-to-End System for Pareto-Optimal Serverless Query Processing

TL;DR

Odyssey tackles the challenge of planning serverless data analytics by introducing an end-to-end pipeline that integrates a serverless-aware planner, a query-agnostic cost model, and a hybrid execution engine. It solves the enormous configuration space via stage-confined dependencies and Incremental Pareto Boundary Search to produce near-constant frontier sizes, while accounting for cold starts and storage throttling to maintain predictive accuracy. Empirical results on TPC-H up to 10TB show that Odyssey's knee-point configurations consistently outperform AWS Athena in either latency or cost, with planning overhead under 5% of total execution time. The work offers a practical, scalable path toward automated, serverless-native analytics and plans to open-source the system to enable reproducibility and adoption.

Abstract

Running data analytics queries on serverless (FaaS) workers has been shown to be cost- and performance-efficient for a variety of real-world scenarios, including intermittent query arrival patterns, sudden load spikes and management challenges that afflict managed VM clusters. Alas, existing serverless data analytics works focus primarily on the serverless execution engine and assume the existence of a "good" query execution plan or rely on user guidance to construct such a plan. Meanwhile, even simple analytics queries on serverless have a huge space of possible plans, with vast differences in both performance and cost among plans. This paper introduces Odyssey, an end-to-end serverless-native data analytics pipeline that integrates a query planner, cost model and execution engine. Odyssey automatically generates and evaluates serverless query plans, utilizing state space pruning heuristics and a novel search algorithm to identify Pareto-optimal plans that balance cost and performance with low latency even for complex queries. Our evaluations demonstrate that Odyssey accurately predicts both monetary cost and latency, and consistently outperforms AWS Athena on cost and/or latency.

Paper Structure

This paper contains 33 sections, 13 equations, 13 figures, 2 tables, 2 algorithms.

Figures (13)

  • Figure 1: Basic end-to-end query processing flow.
  • Figure 2: Cost and performance spectrum of query plans for TPC-H Q4 on 1TB dataset.
  • Figure 3: Overall design of Odyssey.
  • Figure 4: Neighbor-confined effects and local Pareto frontier.
  • Figure 5: Pareto frontier predicted by Odyssey, actual measurements for selected configs and Athena for Q4 at SF 1K.
  • ...and 8 more figures