HeteroMILE: a Multi-Level Graph Representation Learning Framework for Heterogeneous Graphs
Yue Zhang, Yuntian He, Saket Gurukar, Srinivasan Parthasarathy
TL;DR
HeteroMILE tackles the scalability gap in heterogeneous graph embeddings by introducing a generic multi-level framework that coarsens large heterogeneous graphs, embeds on the coarsened graph, and refines embeddings back to the original graph using a heterogenous graph convolutional network. It adds two coarsening strategies (Jaccard similarity and LSH) and a refinement stage that leverages a HGCN with weight-sharing across levels, enabling seamless compatibility with existing base methods such as Metapath2Vec and GATNE. The approach yields substantial runtime reductions (up to ~20x) with maintained or improved embedding quality on link prediction and node classification across four real-world datasets, including the large OGB_MAG graph. These results demonstrate that HeteroMILE provides a practical, scalable solution for learning high-quality embeddings in large heterogeneous graphs without requiring specialized hardware upgrades.
Abstract
Heterogeneous graphs are ubiquitous in real-world applications because they can represent various relationships between different types of entities. Therefore, learning embeddings in such graphs is a critical problem in graph machine learning. However, existing solutions for this problem fail to scale to large heterogeneous graphs due to their high computational complexity. To address this issue, we propose a Multi-Level Embedding framework of nodes on a heterogeneous graph (HeteroMILE) - a generic methodology that allows contemporary graph embedding methods to scale to large graphs. HeteroMILE repeatedly coarsens the large sized graph into a smaller size while preserving the backbone structure of the graph before embedding it, effectively reducing the computational cost by avoiding time-consuming processing operations. It then refines the coarsened embedding to the original graph using a heterogeneous graph convolution neural network. We evaluate our approach using several popular heterogeneous graph datasets. The experimental results show that HeteroMILE can substantially reduce computational time (approximately 20x speedup) and generate an embedding of better quality for link prediction and node classification.
