Cardinality Estimation on Hyper-relational Knowledge Graphs
Fei Teng, Haoyang Li, Shimin Di, Lei Chen
TL;DR
This work tackles cardinality estimation on hyper-relational knowledge graphs (HKGs) by introducing diverse, unbiased query benchmarks and a qualifier-aware graph neural network (HRQE). HRQE directly integrates qualifiers via qualifier-aware message passing, qualifier completion with a CVAE, and adaptive multi-layer combination, coupled with a data augmentation strategy to improve generalization. Empirical results across three HKGs show HRQE consistently outperforms state-of-the-art CE methods in both accuracy and efficiency, with ablations confirming the value of each component. The study provides practical benchmarks and a scalable CE framework that can enhance query optimization and motif distribution predictions in complex HKGs.
Abstract
Cardinality Estimation (CE) for query is to estimate the number of results without execution, which is an effective index in query optimization. Recently, CE for queries over knowlege graph (KGs) with triple facts has achieved great success. To more precisely represent facts, current researchers propose hyper-relational KGs (HKGs) to represent a triple fact with qualifiers providing additional context to the fact. However, existing CE methods, such as sampling and summary methods over KGs, perform unsatisfactorily on HKGs due to the complexity of qualifiers. Learning-based CE methods do not utilize qualifier information to learn query representation accurately, leading to poor performance. Also, there is only one limited CE benchmark for HKG query, which is not comprehensive and only covers limited patterns. The lack of querysets over HKG also becomes a bottleneck to comprehensively investigate CE problems on HKGs. In this work, we first construct diverse and unbiased hyper-relational querysets over three popular HKGs for investigating CE. Besides, we also propose a novel qualifier-aware graph neural network (GNN) model that effectively incorporates qualifier information and adaptively combines outputs from multiple GNN layers, to accurately predict the cardinality. Our experiments demonstrate that our model outperforms all state-of-the-art CE methods over three benchmarks on popular HKGs.
