Learned Cardinalities: Estimating Correlated Joins with Deep Learning
Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, Alfons Kemper
TL;DR
This work tackles cardinality estimation for query optimization by introducing MSCN, a multi-set convolutional network that represents queries as sets of tables, joins, and predicates. By applying per-element MLPs and averaging within each set, MSCN achieves a permutation-invariant, compact model that can incorporate materialized sample bitmaps to learn join-crossing correlations and handle 0-tuple scenarios. The approach is trained on synthetically generated queries with labels derived from actual data and augmented with sampling signals, and evaluated on the IMDb dataset where it competes with and often surpasses state-of-the-art sampling methods while using far less data. The results demonstrate robustness to challenging cases and highlight promising directions for extending the model to more complex predicates, uncertainty estimation, and update handling, offering a feasible ML-based alternative to traditional cardinality estimation techniques.
Abstract
We describe a new deep learning approach to cardinality estimation. MSCN is a multi-set convolutional network, tailored to representing relational query plans, that employs set semantics to capture query features and true cardinalities. MSCN builds on sampling-based estimation, addressing its weaknesses when no sampled tuples qualify a predicate, and in capturing join-crossing correlations. Our evaluation of MSCN using a real-world dataset shows that deep learning significantly enhances the quality of cardinality estimation, which is the core problem in query optimization.
