Table of Contents
Fetching ...

Cardinality Estimation over Knowledge Graphs with Embeddings and Graph Neural Networks

Tim Schwabe, Maribel Acosta

TL;DR

GNCE tackles the challenge of estimating conjunctive query cardinalities over knowledge graphs by fusing semantically informed KG embeddings with a permutation-invariant GNN in a lightweight regression setup. The method yields accurate predictions, generalizes to unseen entities, and delivers fast online inferences suitable for query optimization. Extensive experiments across SWDF, LUBM, YAGO, and Wikidata demonstrate GNCE's superiority over sampling, summaries, and other ML baselines, with robust inductive performance and favorable runtime characteristics. The work highlights practical deployment considerations and points to future enhancements such as handling filters, literals, active learning, and task-specific KG embeddings.

Abstract

Cardinality Estimation over Knowledge Graphs (KG) is crucial for query optimization, yet remains a challenging task due to the semi-structured nature and complex correlations of typical Knowledge Graphs. In this work, we propose GNCE, a novel approach that leverages knowledge graph embeddings and Graph Neural Networks (GNN) to accurately predict the cardinality of conjunctive queries. GNCE first creates semantically meaningful embeddings for all entities in the KG, which are then integrated into the given query, which is processed by a GNN to estimate the cardinality of the query. We evaluate GNCE on several KGs in terms of q-Error and demonstrate that it outperforms state-of-the-art approaches based on sampling, summaries, and (machine) learning in terms of estimation accuracy while also having lower execution time and less parameters. Additionally, we show that GNCE can inductively generalise to unseen entities, making it suitable for use in dynamic query processing scenarios. Our proposed approach has the potential to significantly improve query optimization and related applications that rely on accurate cardinality estimates of conjunctive queries.

Cardinality Estimation over Knowledge Graphs with Embeddings and Graph Neural Networks

TL;DR

GNCE tackles the challenge of estimating conjunctive query cardinalities over knowledge graphs by fusing semantically informed KG embeddings with a permutation-invariant GNN in a lightweight regression setup. The method yields accurate predictions, generalizes to unseen entities, and delivers fast online inferences suitable for query optimization. Extensive experiments across SWDF, LUBM, YAGO, and Wikidata demonstrate GNCE's superiority over sampling, summaries, and other ML baselines, with robust inductive performance and favorable runtime characteristics. The work highlights practical deployment considerations and points to future enhancements such as handling filters, literals, active learning, and task-specific KG embeddings.

Abstract

Cardinality Estimation over Knowledge Graphs (KG) is crucial for query optimization, yet remains a challenging task due to the semi-structured nature and complex correlations of typical Knowledge Graphs. In this work, we propose GNCE, a novel approach that leverages knowledge graph embeddings and Graph Neural Networks (GNN) to accurately predict the cardinality of conjunctive queries. GNCE first creates semantically meaningful embeddings for all entities in the KG, which are then integrated into the given query, which is processed by a GNN to estimate the cardinality of the query. We evaluate GNCE on several KGs in terms of q-Error and demonstrate that it outperforms state-of-the-art approaches based on sampling, summaries, and (machine) learning in terms of estimation accuracy while also having lower execution time and less parameters. Additionally, we show that GNCE can inductively generalise to unseen entities, making it suitable for use in dynamic query processing scenarios. Our proposed approach has the potential to significantly improve query optimization and related applications that rely on accurate cardinality estimates of conjunctive queries.
Paper Structure (32 sections, 8 equations, 9 figures, 9 tables)

This paper contains 32 sections, 8 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: GNCE Overview. For a KG, embeddings are learned for existing atoms. Next, a GNN is trained using sampled queries, which are represented with the embeddings. Finally, the trained model estimates the cardinality of new queries.
  • Figure 2: Architecture of the GNN for cardinality estimation. An initial query is represented by entity- and predicate embeddings and the connectivity between those atoms. For clarity, the TPN aggregation is only shown for one node. After node aggregation, the complete query graph is represented by summing all node embeddings. Finally, the query cardinality is estimated by mapping the query representation through a multilayer perceptron to a single dimension.
  • Figure 3: Boxplots of q-Errors of star queries
  • Figure 4: q-Errors for star-shaped queries, grouped by the true cardinalities of the queries
  • Figure 5: Boxplots of q-Errors for path queries
  • ...and 4 more figures

Theorems & Definitions (5)

  • Definition 1: Knowledge Graph
  • Definition 2: Query Graph
  • Definition 3: Query Solution
  • Definition 4: Query Cardinality
  • Definition 5: TPN Message Passing