Table of Contents
Fetching ...

Bi-View Embedding Fusion: A Hybrid Learning Approach for Knowledge Graph's Nodes Classification Addressing Problems with Limited Data

Rosario Napoli, Giovanni Lonia, Antonio Celesti, Massimo Villari, Maria Fazio

TL;DR

This work tackles node classification in Knowledge Graphs under data sparsity by proposing Bi-View Embedding Fusion, a hybrid approach that combines global topology from Node2Vec with supervised neighborhood signals from GraphSAGE, augmented by centrality metrics. A learnable fusion mechanism per node balances the two views to produce expressive embeddings $Z^{(fused)}$, improving generalization without synthetic data. Empirical evaluation on the FAERS KG shows Bi-View achieving higher accuracy and better class separability than either baseline alone, validating the approach under imbalanced, low-feature conditions. The study highlights practical gains for KG-based GML in sparse data settings and outlines avenues for extending the framework to temporal, multi-relational, or cross-domain graphs plus meta-learning-based transfer.

Abstract

Traditional Machine Learning (ML) methods require large amounts of data to perform well, limiting their applicability in sparse or incomplete scenarios and forcing the usage of additional synthetic data to improve the model training. To overcome this challenge, the research community is looking more and more at Graph Machine Learning (GML) as it offers a powerful alternative by using relationships within data. However, this method also faces limitations, particularly when dealing with Knowledge Graphs (KGs), which can hide huge information due to their semantic nature. This study introduces Bi-View, a novel hybrid approach that increases the informative content of node features in KGs to generate enhanced Graph Embeddings (GEs) that are used to improve GML models without relying on additional synthetic data. The proposed work combines two complementary GE techniques: Node2Vec, which captures structural patterns through unsupervised random walks, and GraphSAGE, which aggregates neighbourhood information in a supervised way. Node2Vec embeddings are first computed to represent the graph topology, and node features are then enriched with centrality-based metrics, which are used as input for the GraphSAGE model. Moreover, a fusion layer combines the original Node2Vec embeddings with the GraphSAGE-influenced representations, resulting in a dual-perspective embedding space. Such a fusion captures both topological and semantic properties of the graph, enabling the model to exploit informative features that may exist in the dataset but that are not explicitly represented. Our approach improves downstream task performance, especially in scenarios with poor initial features, giving the basis for more accurate and precise KG-enanched GML models.

Bi-View Embedding Fusion: A Hybrid Learning Approach for Knowledge Graph's Nodes Classification Addressing Problems with Limited Data

TL;DR

This work tackles node classification in Knowledge Graphs under data sparsity by proposing Bi-View Embedding Fusion, a hybrid approach that combines global topology from Node2Vec with supervised neighborhood signals from GraphSAGE, augmented by centrality metrics. A learnable fusion mechanism per node balances the two views to produce expressive embeddings , improving generalization without synthetic data. Empirical evaluation on the FAERS KG shows Bi-View achieving higher accuracy and better class separability than either baseline alone, validating the approach under imbalanced, low-feature conditions. The study highlights practical gains for KG-based GML in sparse data settings and outlines avenues for extending the framework to temporal, multi-relational, or cross-domain graphs plus meta-learning-based transfer.

Abstract

Traditional Machine Learning (ML) methods require large amounts of data to perform well, limiting their applicability in sparse or incomplete scenarios and forcing the usage of additional synthetic data to improve the model training. To overcome this challenge, the research community is looking more and more at Graph Machine Learning (GML) as it offers a powerful alternative by using relationships within data. However, this method also faces limitations, particularly when dealing with Knowledge Graphs (KGs), which can hide huge information due to their semantic nature. This study introduces Bi-View, a novel hybrid approach that increases the informative content of node features in KGs to generate enhanced Graph Embeddings (GEs) that are used to improve GML models without relying on additional synthetic data. The proposed work combines two complementary GE techniques: Node2Vec, which captures structural patterns through unsupervised random walks, and GraphSAGE, which aggregates neighbourhood information in a supervised way. Node2Vec embeddings are first computed to represent the graph topology, and node features are then enriched with centrality-based metrics, which are used as input for the GraphSAGE model. Moreover, a fusion layer combines the original Node2Vec embeddings with the GraphSAGE-influenced representations, resulting in a dual-perspective embedding space. Such a fusion captures both topological and semantic properties of the graph, enabling the model to exploit informative features that may exist in the dataset but that are not explicitly represented. Our approach improves downstream task performance, especially in scenarios with poor initial features, giving the basis for more accurate and precise KG-enanched GML models.

Paper Structure

This paper contains 15 sections, 3 theorems, 23 equations, 12 figures.

Key Result

lemma thmcounterlemma

Let $h^{(0)}_v$ and $h^{(0)}_u$ be the enhanced GEs of two distinct nodes $v, u \in V$ and $\ell_V(v)$ and $\ell_V(u)$ the not enhanced GEs. Then, if $\phi(v) \neq \phi(u)$ or $\gamma(v) \neq \gamma(u)$, it follows that:

Figures (12)

  • Figure 1: Bi-View strategy.
  • Figure 2: KG schema for FAERS dataset.
  • Figure 3: Class distribution.
  • Figure 4: Node2Vec
  • Figure 5: GraphSAGE
  • ...and 7 more figures

Theorems & Definitions (17)

  • definition thmcounterdefinition: Knowledge Graph
  • definition thmcounterdefinition: Graph Embedding
  • definition thmcounterdefinition: Transition probability
  • definition thmcounterdefinition: Transition probability
  • definition thmcounterdefinition: Node2Vec on Knowledge Graphs as Unsupervised Aggregator
  • definition thmcounterdefinition: GraphSAGE on Knowledge Graphs as Supervised Aggregator
  • definition thmcounterdefinition
  • definition thmcounterdefinition: Centrality Vector
  • definition thmcounterdefinition: Structurally Enriched Initial Features
  • definition thmcounterdefinition: GraphSAGE with Structural Feature Injection
  • ...and 7 more