Table of Contents
Fetching ...

CrypQ: A Database Benchmark Based on Dynamic, Ever-Evolving Ethereum Data

Vincent Capol, Yuxi Liu, Haibo Xiu, Jun Yang

TL;DR

CrypQ is introduced, a database benchmark leveraging dynamic, public Ethereum blockchain data that offers a high-volume, ever-evolving dataset reflecting the unpredictable nature of a real and active cryptocurrency market and its utility in evaluating cost-based query optimizers on complex, evolving data distributions with real-world skewness and dependencies.

Abstract

Modern database systems are expected to handle dynamic data whose characteristics may evolve over time. Many popular database benchmarks are limited in their ability to evaluate this dynamic aspect of the database systems. Those that use synthetic data generators often fail to capture the complexity and unpredictable nature of real data, while most real-world datasets are static and difficult to create high-volume, realistic updates for. This paper introduces CrypQ, a database benchmark leveraging dynamic, public Ethereum blockchain data. CrypQ offers a high-volume, ever-evolving dataset reflecting the unpredictable nature of a real and active cryptocurrency market. We detail CrypQ's schema, procedures for creating data snapshots and update sequences, and a suite of relevant SQL queries. As an example, we demonstrate CrypQ's utility in evaluating cost-based query optimizers on complex, evolving data distributions with real-world skewness and dependencies.

CrypQ: A Database Benchmark Based on Dynamic, Ever-Evolving Ethereum Data

TL;DR

CrypQ is introduced, a database benchmark leveraging dynamic, public Ethereum blockchain data that offers a high-volume, ever-evolving dataset reflecting the unpredictable nature of a real and active cryptocurrency market and its utility in evaluating cost-based query optimizers on complex, evolving data distributions with real-world skewness and dependencies.

Abstract

Modern database systems are expected to handle dynamic data whose characteristics may evolve over time. Many popular database benchmarks are limited in their ability to evaluate this dynamic aspect of the database systems. Those that use synthetic data generators often fail to capture the complexity and unpredictable nature of real data, while most real-world datasets are static and difficult to create high-volume, realistic updates for. This paper introduces CrypQ, a database benchmark leveraging dynamic, public Ethereum blockchain data. CrypQ offers a high-volume, ever-evolving dataset reflecting the unpredictable nature of a real and active cryptocurrency market. We detail CrypQ's schema, procedures for creating data snapshots and update sequences, and a suite of relevant SQL queries. As an example, we demonstrate CrypQ's utility in evaluating cost-based query optimizers on complex, evolving data distributions with real-world skewness and dependencies.

Paper Structure

This paper contains 13 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: CrypQ schema. Components of the primary key for each table are shaded; UNIQUE keys are not marked. Arrows go from foreign keys to the primary keys they reference; two lines are dashed because they do not reference primary keys.
  • Figure 2: Accuracy of cardinality estimation for Scenario 1. Q-errors with respect to true cardinalities are shown as labels on the lines.
  • Figure 3: Accuracy of cardinality estimation for Scenario 2. Q-errors with respect to true cardinalities are shown as labels on the lines.
  • Figure 4: Details of query plans.