Table of Contents
Fetching ...

Grid-AR: A Grid-based Booster for Learned Cardinality Estimation and Range Joins

Damjan Gjurovski, Angjela Davitkova, Sebastian Michel

TL;DR

An advancement in cardinality estimation is proposed by augmenting autoregressive models with a traditional grid structure and presenting an algorithm that enables the estimator to calculate cardinality estimates for range join queries efficiently.

Abstract

We propose an advancement in cardinality estimation by augmenting autoregressive models with a traditional grid structure. The novel hybrid estimator addresses the limitations of autoregressive models by creating a smaller representation of continuous columns and by incorporating a batch execution for queries with range predicates, as opposed to an iterative sampling approach. The suggested modification markedly improves the execution time of the model for both training and prediction, reduces memory consumption, and does so with minimal decline in accuracy. We further present an algorithm that enables the estimator to calculate cardinality estimates for range join queries efficiently. To validate the effectiveness of our cardinality estimator, we conduct and present a comprehensive evaluation considering state-of-the-art competitors using three benchmark datasets -- demonstrating vast improvements in execution times and resource utilization.

Grid-AR: A Grid-based Booster for Learned Cardinality Estimation and Range Joins

TL;DR

An advancement in cardinality estimation is proposed by augmenting autoregressive models with a traditional grid structure and presenting an algorithm that enables the estimator to calculate cardinality estimates for range join queries efficiently.

Abstract

We propose an advancement in cardinality estimation by augmenting autoregressive models with a traditional grid structure. The novel hybrid estimator addresses the limitations of autoregressive models by creating a smaller representation of continuous columns and by incorporating a batch execution for queries with range predicates, as opposed to an iterative sampling approach. The suggested modification markedly improves the execution time of the model for both training and prediction, reduces memory consumption, and does so with minimal decline in accuracy. We further present an algorithm that enables the estimator to calculate cardinality estimates for range join queries efficiently. To validate the effectiveness of our cardinality estimator, we conduct and present a comprehensive evaluation considering state-of-the-art competitors using three benchmark datasets -- demonstrating vast improvements in execution times and resource utilization.

Paper Structure

This paper contains 22 sections, 5 equations, 6 figures, 9 tables, 2 algorithms.

Figures (6)

  • Figure 1: Left and bottom right: 2D four-cell grid over Manhattan's restaurant ratings and AR model with coordinates replaced by grid cells; Top right: space of possible solutions.
  • Figure 2: Creation and querying of Grid-AR. Creation of the grid structure for the continuous attributes and an autoregressive model for the categorical ones together with the assigned grid cell id (part 2). For an example query, the Grid-AR parts responsible for the different types of query predicates are used for producing cardinality estimates (part 3).
  • Figure 3: Example range join query. The grid structure with values is shown on the left-hand side and the comparison between qualifying grid cells on the right-hand side.
  • Figure 4: Memory consumption in megabytes considering the size of both the estimator and the dictionary mapping.
  • Figure 5: Estimation time in milliseconds (left) and memory consumption in megabytes (right) when varying the total number of grid cells (Payment).
  • ...and 1 more figures