Table of Contents
Fetching ...

Cardinality Estimation on Hyper-relational Knowledge Graphs

Fei Teng, Haoyang Li, Shimin Di, Lei Chen

TL;DR

This work tackles cardinality estimation on hyper-relational knowledge graphs (HKGs) by introducing diverse, unbiased query benchmarks and a qualifier-aware graph neural network (HRQE). HRQE directly integrates qualifiers via qualifier-aware message passing, qualifier completion with a CVAE, and adaptive multi-layer combination, coupled with a data augmentation strategy to improve generalization. Empirical results across three HKGs show HRQE consistently outperforms state-of-the-art CE methods in both accuracy and efficiency, with ablations confirming the value of each component. The study provides practical benchmarks and a scalable CE framework that can enhance query optimization and motif distribution predictions in complex HKGs.

Abstract

Cardinality Estimation (CE) for query is to estimate the number of results without execution, which is an effective index in query optimization. Recently, CE for queries over knowlege graph (KGs) with triple facts has achieved great success. To more precisely represent facts, current researchers propose hyper-relational KGs (HKGs) to represent a triple fact with qualifiers providing additional context to the fact. However, existing CE methods, such as sampling and summary methods over KGs, perform unsatisfactorily on HKGs due to the complexity of qualifiers. Learning-based CE methods do not utilize qualifier information to learn query representation accurately, leading to poor performance. Also, there is only one limited CE benchmark for HKG query, which is not comprehensive and only covers limited patterns. The lack of querysets over HKG also becomes a bottleneck to comprehensively investigate CE problems on HKGs. In this work, we first construct diverse and unbiased hyper-relational querysets over three popular HKGs for investigating CE. Besides, we also propose a novel qualifier-aware graph neural network (GNN) model that effectively incorporates qualifier information and adaptively combines outputs from multiple GNN layers, to accurately predict the cardinality. Our experiments demonstrate that our model outperforms all state-of-the-art CE methods over three benchmarks on popular HKGs.

Cardinality Estimation on Hyper-relational Knowledge Graphs

TL;DR

This work tackles cardinality estimation on hyper-relational knowledge graphs (HKGs) by introducing diverse, unbiased query benchmarks and a qualifier-aware graph neural network (HRQE). HRQE directly integrates qualifiers via qualifier-aware message passing, qualifier completion with a CVAE, and adaptive multi-layer combination, coupled with a data augmentation strategy to improve generalization. Empirical results across three HKGs show HRQE consistently outperforms state-of-the-art CE methods in both accuracy and efficiency, with ablations confirming the value of each component. The study provides practical benchmarks and a scalable CE framework that can enhance query optimization and motif distribution predictions in complex HKGs.

Abstract

Cardinality Estimation (CE) for query is to estimate the number of results without execution, which is an effective index in query optimization. Recently, CE for queries over knowlege graph (KGs) with triple facts has achieved great success. To more precisely represent facts, current researchers propose hyper-relational KGs (HKGs) to represent a triple fact with qualifiers providing additional context to the fact. However, existing CE methods, such as sampling and summary methods over KGs, perform unsatisfactorily on HKGs due to the complexity of qualifiers. Learning-based CE methods do not utilize qualifier information to learn query representation accurately, leading to poor performance. Also, there is only one limited CE benchmark for HKG query, which is not comprehensive and only covers limited patterns. The lack of querysets over HKG also becomes a bottleneck to comprehensively investigate CE problems on HKGs. In this work, we first construct diverse and unbiased hyper-relational querysets over three popular HKGs for investigating CE. Besides, we also propose a novel qualifier-aware graph neural network (GNN) model that effectively incorporates qualifier information and adaptively combines outputs from multiple GNN layers, to accurately predict the cardinality. Our experiments demonstrate that our model outperforms all state-of-the-art CE methods over three benchmarks on popular HKGs.
Paper Structure (44 sections, 2 theorems, 8 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 44 sections, 2 theorems, 8 equations, 8 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Algorithm alg:querygeneration can obtain the exact cardinality for each query in $O(|\mathcal{V}|*|\mathcal{E}^Q|*|\mathcal{E}|)$.

Figures (8)

  • Figure 1: Five query patterns and query generation
  • Figure 2: Given a query $Q$ with graph form $G^Q$, all atoms in $G^Q$ are firstly initialized to embedding. Then initialized $G^Q$ are fed into our qualifier-aware GNN encoder. Specifically, $G^Q$ is passed to $K$ message passing layers, In each layer, $G^Q$ will perform message passing after qualifier completion on each fact pattern. The node embeddings of $G^Q$ at different layer will be adaptively combined to generate the final representation. Finally, an MLP decoder computes the estimated cardinality based on query representation. In training phase, a data augmentation strategy modifies $G^Q$ edges/qualifiers first to generate augmented training data. We keep the relative magnitude between predicted cardinality of augmented query graph and cardinality of $Q$ cardinality of $Q$ in training.
  • Figure 3: Q-Error boxplots of varying GNN layer number over three datasets
  • Figure 4: Q-Error boxplots of varying $\lambda$ over three datasets
  • Figure 5: Q-Error boxplots grouping by query pattern over three datasets
  • ...and 3 more figures

Theorems & Definitions (3)

  • Definition 1: Cardinality Estimation on HKG Query
  • Theorem 1
  • Theorem 2