Table of Contents
Fetching ...

Improving Graph Embeddings in Machine Learning Using Knowledge Completion with Validation in a Case Study on COVID-19 Spread

Rosario Napoli, Gabriele Morabito, Antonio Celesti, Massimo Villari, Maria Fazio

TL;DR

The paper tackles the limitation that standard graph embeddings miss latent, implicit knowledge in sparse datasets. It introduces a dedicated Knowledge Completion (KC) phase that models scalable transitive relationships with decay-based inference to complete the graph prior to embedding. The KC-enhanced GML pipeline materially alters embedding space geometry and centrality dynamics, as demonstrated on a temporal COVID-19 contact network, improving the expressiveness of both Node2Vec and GraphSAGE embeddings. The findings suggest KC is a transformative pre-processing step, with practical implications for more accurate propagation and centrality analysis in knowledge-rich graphs, and motivate further study on downstream task performance and scalability.

Abstract

The rise of graph-structured data has driven major advances in Graph Machine Learning (GML), where graph embeddings (GEs) map features from Knowledge Graphs (KGs) into vector spaces, enabling tasks like node classification and link prediction. However, since GEs are derived from explicit topology and features, they may miss crucial implicit knowledge hidden in seemingly sparse datasets, affecting graph structure and their representation. We propose a GML pipeline that integrates a Knowledge Completion (KC) phase to uncover latent dataset semantics before embedding generation. Focusing on transitive relations, we model hidden connections with decay-based inference functions, reshaping graph topology, with consequences on embedding dynamics and aggregation processes in GraphSAGE and Node2Vec. Experiments show that our GML pipeline significantly alters the embedding space geometry, demonstrating that its introduction is not just a simple enrichment but a transformative step that redefines graph representation quality.

Improving Graph Embeddings in Machine Learning Using Knowledge Completion with Validation in a Case Study on COVID-19 Spread

TL;DR

The paper tackles the limitation that standard graph embeddings miss latent, implicit knowledge in sparse datasets. It introduces a dedicated Knowledge Completion (KC) phase that models scalable transitive relationships with decay-based inference to complete the graph prior to embedding. The KC-enhanced GML pipeline materially alters embedding space geometry and centrality dynamics, as demonstrated on a temporal COVID-19 contact network, improving the expressiveness of both Node2Vec and GraphSAGE embeddings. The findings suggest KC is a transformative pre-processing step, with practical implications for more accurate propagation and centrality analysis in knowledge-rich graphs, and motivate further study on downstream task performance and scalability.

Abstract

The rise of graph-structured data has driven major advances in Graph Machine Learning (GML), where graph embeddings (GEs) map features from Knowledge Graphs (KGs) into vector spaces, enabling tasks like node classification and link prediction. However, since GEs are derived from explicit topology and features, they may miss crucial implicit knowledge hidden in seemingly sparse datasets, affecting graph structure and their representation. We propose a GML pipeline that integrates a Knowledge Completion (KC) phase to uncover latent dataset semantics before embedding generation. Focusing on transitive relations, we model hidden connections with decay-based inference functions, reshaping graph topology, with consequences on embedding dynamics and aggregation processes in GraphSAGE and Node2Vec. Experiments show that our GML pipeline significantly alters the embedding space geometry, demonstrating that its introduction is not just a simple enrichment but a transformative step that redefines graph representation quality.

Paper Structure

This paper contains 16 sections, 2 theorems, 28 equations, 7 figures, 1 table.

Key Result

Lemma 1

Let $KG = (V, E, R, \ell_V))$ be a Knowledge Graph and $KG' = (V, E \cup E_{KC}, R, \ell_V )$ the Knowledge Graph resulting from a KC step that infers transitive relationships. Let $u \in V$ be a node, let $\mathcal{W}$ be a Node2Vec algorithm executed on $KG$, and let $\mathcal{W}'$ be the same Nod

Figures (7)

  • Figure 1: Current GML pipeline.
  • Figure 2: Our new GML pipeline.
  • Figure 3: KG Schema.
  • Figure 4: Inferred Contacts.
  • Figure 5: Contact growth between $KG_{\mathrm{raw}}$ and $KG_{\mathrm{KC}}$.
  • ...and 2 more figures

Theorems & Definitions (19)

  • Definition 1: Knowledge Graph
  • Definition 2: KG atomic unit
  • Definition 3: Graph Embedding
  • Definition 4: Knowledge Completion
  • Definition 5: Transitive Relationship
  • Definition 6: Transitive relationship in Knowledge Graphs
  • Definition 7: Path
  • Definition 8: Strength of a path
  • Definition 9: Scalable transitive relationship
  • Definition 10: Transition probability
  • ...and 9 more