AutoCE: An Accurate and Efficient Model Advisor for Learned Cardinality Estimation
Jintao Zhang, Chao Zhang, Guoliang Li, Chengliang Chai
TL;DR
AutoCE addresses the challenge of selecting the most effective learned cardinality estimation model for a given dataset by encoding dataset features as graphs and training a similarity-aware encoder via deep metric learning. It combines offline data generation/labeling, a graph-encoded representation, incremental learning with data augmentation, and a KNN-based predictor to recommend models, with online adaptation to handle distribution shifts. Integrated into PostgreSQL, AutoCE delivers substantial improvements in end-to-end query performance (about 27% on average) and gains in CE accuracy and efficacy (roughly 2.1x and 4.2x respectively). The approach demonstrates a practical and scalable path to robust, data-driven CE model selection across diverse workloads and data distributions.
Abstract
Cardinality estimation (CE) plays a crucial role in many database-related tasks such as query generation, cost estimation, and join ordering. Lately, we have witnessed the emergence of numerous learned CE models. However, no single CE model is invincible when it comes to the datasets with various data distributions. To facilitate data-intensive applications with accurate and efficient cardinality estimation, it is important to have an approach that can judiciously and efficiently select the most suitable CE model for an arbitrary dataset. In this paper, we study a new problem of selecting the best CE models for a variety of datasets. This problem is rather challenging as it is hard to capture the relationship from various datasets to the performance of disparate models. To address this problem, we propose a model advisor, named AutoCE, which can adaptively select the best model for a dataset. The main contribution of AutoCE is the learning-based model selection, where deep metric learning is used to learn a recommendation model and incremental learning is proposed to reduce the labeling overhead and improve the model robustness. We have integrated AutoCE into PostgreSQL and evaluated its impact on query optimization. The results showed that AutoCE achieved the best performance (27% better) and outperformed the baselines concerning accuracy (2.1 times better) and efficacy (4.2 times better).
