Table of Contents
Fetching ...

CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases

Yannis Chronis, Yawen Wang, Yu Gan, Sami Abu-El-Haija, Chelsea Lin, Carsten Binnig, Fatma Özcan

TL;DR

CardBench tackles the need for a diverse, large-scale benchmark for learned cardinality estimation in relational databases. It introduces CardBench, a benchmark with thousands of queries across 20 real-world datasets and two training data configurations, plus an open-source pipeline for statistics, query generation, and annotated query graphs. The study evaluates Graph Neural Network (GNN) and Graph Transformer approaches under instance-based, zero-shot, and fine-tuned regimes, finding that zero-shot generalization is challenging for joins but fine-tuning with modest data can achieve accuracy comparable to instance-based models while reducing training overhead. By providing extensive datasets and tooling, CardBench enables systematic progress in pre-trained CE methods and invites the ML and DB communities to extend to more complex workloads.

Abstract

Cardinality estimation is crucial for enabling high query performance in relational databases. Recently learned cardinality estimation models have been proposed to improve accuracy but there is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches and even systematically develop new learned approaches. In this paper, we are releasing a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation. In contrast to other initial benchmarks, our benchmark is much more diverse and can be used for training and testing learned models systematically. Using this benchmark, we explored whether learned cardinality estimation can be transferred to an unseen dataset in a zero-shot manner. We trained GNN-based and transformer-based models to study the problem in three setups: 1-) instance-based, 2-) zero-shot, and 3-) fine-tuned. Our results show that while we get promising results for zero-shot cardinality estimation on simple single table queries; as soon as we add joins, the accuracy drops. However, we show that with fine-tuning, we can still utilize pre-trained models for cardinality estimation, significantly reducing training overheads compared to instance specific models. We are open sourcing our scripts to collect statistics, generate queries and training datasets to foster more extensive research, also from the ML community on the important problem of cardinality estimation and in particular improve on recent directions such as pre-trained cardinality estimation.

CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases

TL;DR

CardBench tackles the need for a diverse, large-scale benchmark for learned cardinality estimation in relational databases. It introduces CardBench, a benchmark with thousands of queries across 20 real-world datasets and two training data configurations, plus an open-source pipeline for statistics, query generation, and annotated query graphs. The study evaluates Graph Neural Network (GNN) and Graph Transformer approaches under instance-based, zero-shot, and fine-tuned regimes, finding that zero-shot generalization is challenging for joins but fine-tuning with modest data can achieve accuracy comparable to instance-based models while reducing training overhead. By providing extensive datasets and tooling, CardBench enables systematic progress in pre-trained CE methods and invites the ML and DB communities to extend to more complex workloads.

Abstract

Cardinality estimation is crucial for enabling high query performance in relational databases. Recently learned cardinality estimation models have been proposed to improve accuracy but there is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches and even systematically develop new learned approaches. In this paper, we are releasing a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation. In contrast to other initial benchmarks, our benchmark is much more diverse and can be used for training and testing learned models systematically. Using this benchmark, we explored whether learned cardinality estimation can be transferred to an unseen dataset in a zero-shot manner. We trained GNN-based and transformer-based models to study the problem in three setups: 1-) instance-based, 2-) zero-shot, and 3-) fine-tuned. Our results show that while we get promising results for zero-shot cardinality estimation on simple single table queries; as soon as we add joins, the accuracy drops. However, we show that with fine-tuning, we can still utilize pre-trained models for cardinality estimation, significantly reducing training overheads compared to instance specific models. We are open sourcing our scripts to collect statistics, generate queries and training datasets to foster more extensive research, also from the ML community on the important problem of cardinality estimation and in particular improve on recent directions such as pre-trained cardinality estimation.
Paper Structure (25 sections, 1 equation, 6 figures, 2 tables)

This paper contains 25 sections, 1 equation, 6 figures, 2 tables.

Figures (6)

  • Figure 1: CardBench Benchmark Creation Steps. The scripts to create the benchmark along with the training data (query graphs) released as open-source.
  • Figure 2: Box plots of P50 (left) and P95 (right) q-errors of different model configurations for queries on a single table, aggregated across 20 test datasets.
  • Figure 3: Box plots of P50 (left) and P95 (right) q-errors of different model configurations for queries contains a binary join, aggregated across 20 test datasets.
  • Figure 4: Box plots of P50 (left) and P95 (right) q-errors of instance-based vs. fine-tuned models of GNN (top) and Transformer (bottom) with varying training sample size for queries on a single table, aggregated across 20 test datasets.
  • Figure 5: Box plots of P50 (left) and P95 (right) q-errors of instance-based vs. fine-tuned models of GNN (top) and Transformer (bottom) with varying training sample size for queries contains a binary join, aggregated across 20 test datasets.
  • ...and 1 more figures