AutoCE: An Accurate and Efficient Model Advisor for Learned Cardinality Estimation

Jintao Zhang; Chao Zhang; Guoliang Li; Chengliang Chai

AutoCE: An Accurate and Efficient Model Advisor for Learned Cardinality Estimation

Jintao Zhang, Chao Zhang, Guoliang Li, Chengliang Chai

TL;DR

AutoCE addresses the challenge of selecting the most effective learned cardinality estimation model for a given dataset by encoding dataset features as graphs and training a similarity-aware encoder via deep metric learning. It combines offline data generation/labeling, a graph-encoded representation, incremental learning with data augmentation, and a KNN-based predictor to recommend models, with online adaptation to handle distribution shifts. Integrated into PostgreSQL, AutoCE delivers substantial improvements in end-to-end query performance (about 27% on average) and gains in CE accuracy and efficacy (roughly 2.1x and 4.2x respectively). The approach demonstrates a practical and scalable path to robust, data-driven CE model selection across diverse workloads and data distributions.

Abstract

Cardinality estimation (CE) plays a crucial role in many database-related tasks such as query generation, cost estimation, and join ordering. Lately, we have witnessed the emergence of numerous learned CE models. However, no single CE model is invincible when it comes to the datasets with various data distributions. To facilitate data-intensive applications with accurate and efficient cardinality estimation, it is important to have an approach that can judiciously and efficiently select the most suitable CE model for an arbitrary dataset. In this paper, we study a new problem of selecting the best CE models for a variety of datasets. This problem is rather challenging as it is hard to capture the relationship from various datasets to the performance of disparate models. To address this problem, we propose a model advisor, named AutoCE, which can adaptively select the best model for a dataset. The main contribution of AutoCE is the learning-based model selection, where deep metric learning is used to learn a recommendation model and incremental learning is proposed to reduce the labeling overhead and improve the model robustness. We have integrated AutoCE into PostgreSQL and evaluated its impact on query optimization. The results showed that AutoCE achieved the best performance (27% better) and outperformed the baselines concerning accuracy (2.1 times better) and efficacy (4.2 times better).

AutoCE: An Accurate and Efficient Model Advisor for Learned Cardinality Estimation

TL;DR

Abstract

Paper Structure (35 sections, 14 equations, 13 figures, 5 tables, 2 algorithms)

This paper contains 35 sections, 14 equations, 13 figures, 5 tables, 2 algorithms.

Introduction
Problem Statement
Overview of AutoCE
Training Data Generation
Graph Encoder Learning
Incremental Learning
Model Recommendation
Dataset Generation and Labeling
Dataset Generation
Single Table Generation
Multi-Table Generation
Dataset Labeling
Model Training and Testing
Score Normalization
Model Training and Inference
...and 20 more sections

Figures (13)

Figure 1: Experiment of CE models over different datasets.
Figure 2: An example of AutoCE: after the offline training, it selects a tailored cardinality estimation model with an arbitrary dataset and specified metrics.
Figure 3: An overview of AutoCE on model recommendation including data preparation (Stage 1), training (Stage 2-3) and recommendation (Stage 4).
Figure 4: Feature engineering for a dataset D, including the processes of feature extraction and graph modeling.
Figure 5: Learning effect of graph contrastive learning.
...and 8 more figures

Theorems & Definitions (10)

Example 1: Motivation
Example 2: A Working Example of AutoCE
Definition 1
Example 3
Example 4
Definition 2
Definition 3
Example 5
Definition 4
Definition 5

AutoCE: An Accurate and Efficient Model Advisor for Learned Cardinality Estimation

TL;DR

Abstract

AutoCE: An Accurate and Efficient Model Advisor for Learned Cardinality Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (13)

Theorems & Definitions (10)