Cardinality Estimation on Hyper-relational Knowledge Graphs

Fei Teng; Haoyang Li; Shimin Di; Lei Chen

Cardinality Estimation on Hyper-relational Knowledge Graphs

Fei Teng, Haoyang Li, Shimin Di, Lei Chen

TL;DR

This work tackles cardinality estimation on hyper-relational knowledge graphs (HKGs) by introducing diverse, unbiased query benchmarks and a qualifier-aware graph neural network (HRQE). HRQE directly integrates qualifiers via qualifier-aware message passing, qualifier completion with a CVAE, and adaptive multi-layer combination, coupled with a data augmentation strategy to improve generalization. Empirical results across three HKGs show HRQE consistently outperforms state-of-the-art CE methods in both accuracy and efficiency, with ablations confirming the value of each component. The study provides practical benchmarks and a scalable CE framework that can enhance query optimization and motif distribution predictions in complex HKGs.

Abstract

Cardinality Estimation (CE) for query is to estimate the number of results without execution, which is an effective index in query optimization. Recently, CE for queries over knowlege graph (KGs) with triple facts has achieved great success. To more precisely represent facts, current researchers propose hyper-relational KGs (HKGs) to represent a triple fact with qualifiers providing additional context to the fact. However, existing CE methods, such as sampling and summary methods over KGs, perform unsatisfactorily on HKGs due to the complexity of qualifiers. Learning-based CE methods do not utilize qualifier information to learn query representation accurately, leading to poor performance. Also, there is only one limited CE benchmark for HKG query, which is not comprehensive and only covers limited patterns. The lack of querysets over HKG also becomes a bottleneck to comprehensively investigate CE problems on HKGs. In this work, we first construct diverse and unbiased hyper-relational querysets over three popular HKGs for investigating CE. Besides, we also propose a novel qualifier-aware graph neural network (GNN) model that effectively incorporates qualifier information and adaptively combines outputs from multiple GNN layers, to accurately predict the cardinality. Our experiments demonstrate that our model outperforms all state-of-the-art CE methods over three benchmarks on popular HKGs.

Cardinality Estimation on Hyper-relational Knowledge Graphs

TL;DR

Abstract

Paper Structure (44 sections, 2 theorems, 8 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 44 sections, 2 theorems, 8 equations, 8 figures, 6 tables, 1 algorithm.

Introduction
Preliminary and Related Works
Hyper-relational Knowledge Graphs (HKGs)
Cardinality Estimation on HKG Query
Hyper-relational Knowledge Graph Query and Query Pattern
Cardinality Estimation on Hyper-relational Knowledge Graph Query
Hyper-relational Queryset construction
Hyper-relational Queryset
HKG Queryset and Cardinality Generation
Generated Queryset Data Statistics
A Qualifier-aware GNN
HKG Query Embedding Initialization
HKGs Query Encoder
Qualifier-aware Message Passing
Qualifier Completion
...and 29 more sections

Key Result

Theorem 1

Algorithm alg:querygeneration can obtain the exact cardinality for each query in $O(|\mathcal{V}|*|\mathcal{E}^Q|*|\mathcal{E}|)$.

Figures (8)

Figure 1: Five query patterns and query generation
Figure 2: Given a query $Q$ with graph form $G^Q$, all atoms in $G^Q$ are firstly initialized to embedding. Then initialized $G^Q$ are fed into our qualifier-aware GNN encoder. Specifically, $G^Q$ is passed to $K$ message passing layers, In each layer, $G^Q$ will perform message passing after qualifier completion on each fact pattern. The node embeddings of $G^Q$ at different layer will be adaptively combined to generate the final representation. Finally, an MLP decoder computes the estimated cardinality based on query representation. In training phase, a data augmentation strategy modifies $G^Q$ edges/qualifiers first to generate augmented training data. We keep the relative magnitude between predicted cardinality of augmented query graph and cardinality of $Q$ cardinality of $Q$ in training.
Figure 3: Q-Error boxplots of varying GNN layer number over three datasets
Figure 4: Q-Error boxplots of varying $\lambda$ over three datasets
Figure 5: Q-Error boxplots grouping by query pattern over three datasets
...and 3 more figures

Theorems & Definitions (3)

Definition 1: Cardinality Estimation on HKG Query
Theorem 1
Theorem 2

Cardinality Estimation on Hyper-relational Knowledge Graphs

TL;DR

Abstract

Cardinality Estimation on Hyper-relational Knowledge Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)