Table of Contents
Fetching ...

HeteroMILE: a Multi-Level Graph Representation Learning Framework for Heterogeneous Graphs

Yue Zhang, Yuntian He, Saket Gurukar, Srinivasan Parthasarathy

TL;DR

HeteroMILE tackles the scalability gap in heterogeneous graph embeddings by introducing a generic multi-level framework that coarsens large heterogeneous graphs, embeds on the coarsened graph, and refines embeddings back to the original graph using a heterogenous graph convolutional network. It adds two coarsening strategies (Jaccard similarity and LSH) and a refinement stage that leverages a HGCN with weight-sharing across levels, enabling seamless compatibility with existing base methods such as Metapath2Vec and GATNE. The approach yields substantial runtime reductions (up to ~20x) with maintained or improved embedding quality on link prediction and node classification across four real-world datasets, including the large OGB_MAG graph. These results demonstrate that HeteroMILE provides a practical, scalable solution for learning high-quality embeddings in large heterogeneous graphs without requiring specialized hardware upgrades.

Abstract

Heterogeneous graphs are ubiquitous in real-world applications because they can represent various relationships between different types of entities. Therefore, learning embeddings in such graphs is a critical problem in graph machine learning. However, existing solutions for this problem fail to scale to large heterogeneous graphs due to their high computational complexity. To address this issue, we propose a Multi-Level Embedding framework of nodes on a heterogeneous graph (HeteroMILE) - a generic methodology that allows contemporary graph embedding methods to scale to large graphs. HeteroMILE repeatedly coarsens the large sized graph into a smaller size while preserving the backbone structure of the graph before embedding it, effectively reducing the computational cost by avoiding time-consuming processing operations. It then refines the coarsened embedding to the original graph using a heterogeneous graph convolution neural network. We evaluate our approach using several popular heterogeneous graph datasets. The experimental results show that HeteroMILE can substantially reduce computational time (approximately 20x speedup) and generate an embedding of better quality for link prediction and node classification.

HeteroMILE: a Multi-Level Graph Representation Learning Framework for Heterogeneous Graphs

TL;DR

HeteroMILE tackles the scalability gap in heterogeneous graph embeddings by introducing a generic multi-level framework that coarsens large heterogeneous graphs, embeds on the coarsened graph, and refines embeddings back to the original graph using a heterogenous graph convolutional network. It adds two coarsening strategies (Jaccard similarity and LSH) and a refinement stage that leverages a HGCN with weight-sharing across levels, enabling seamless compatibility with existing base methods such as Metapath2Vec and GATNE. The approach yields substantial runtime reductions (up to ~20x) with maintained or improved embedding quality on link prediction and node classification across four real-world datasets, including the large OGB_MAG graph. These results demonstrate that HeteroMILE provides a practical, scalable solution for learning high-quality embeddings in large heterogeneous graphs without requiring specialized hardware upgrades.

Abstract

Heterogeneous graphs are ubiquitous in real-world applications because they can represent various relationships between different types of entities. Therefore, learning embeddings in such graphs is a critical problem in graph machine learning. However, existing solutions for this problem fail to scale to large heterogeneous graphs due to their high computational complexity. To address this issue, we propose a Multi-Level Embedding framework of nodes on a heterogeneous graph (HeteroMILE) - a generic methodology that allows contemporary graph embedding methods to scale to large graphs. HeteroMILE repeatedly coarsens the large sized graph into a smaller size while preserving the backbone structure of the graph before embedding it, effectively reducing the computational cost by avoiding time-consuming processing operations. It then refines the coarsened embedding to the original graph using a heterogeneous graph convolution neural network. We evaluate our approach using several popular heterogeneous graph datasets. The experimental results show that HeteroMILE can substantially reduce computational time (approximately 20x speedup) and generate an embedding of better quality for link prediction and node classification.
Paper Structure (24 sections, 6 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 24 sections, 6 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overview of HeteroMILE framework
  • Figure 2: Example of matching and merging the nodes
  • Figure 3: Refinement Process of HeteroMILE
  • Figure 4: The performance of HeteroMILE using metapath2vec as the base embedding method varies as the number of coarsening levels increases, as depicted by the color scheme. The results for node classification, measured by the Micro-F1 score, and link prediction, measured by AUROC, are presented in the first and second rows, respectively. The running time, displayed in the third row, is plotted on a logarithmic scale. Notably, the running time lines of Jacc_WRS and Jacc_max overlap, similar to LSH (k=128) and LSH (k=256). "level = 0" represents the original embedding method without HeteroMILE.
  • Figure 5: The performance of HeteroMILE using GATNE as the base embedding method varies as the number of coarsening levels increases, as depicted by the color scheme. The results for node classification, measured by the Micro-F1 score, and link prediction, measured by AUROC, are presented in the first and second rows, respectively. The running time, displayed in the third row, is plotted on a logarithmic scale. Notably, the running time lines of Jacc_WRS and Jacc_max overlap, similar to LSH (k=128) and LSH (k=256). "level = 0" represents the original embedding method without HeteroMILE.
  • ...and 3 more figures