Table of Contents
Fetching ...

Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs

Zehui Li, Xiangyu Zhao, Mingzhu Shen, Guy-Bart Stan, Pietro Liò, Yiren Zhao

TL;DR

The concept of hybrid graphs is introduced, a unified definition for higher-order graphs is presented, and an extensible evaluation framework and a supporting codebase are provided to facilitate the training and evaluation of GNNs on HGB.

Abstract

Graphs are widely used to encapsulate a variety of data formats, but real-world networks often involve complex node relations beyond only being pairwise. While hypergraphs and hierarchical graphs have been developed and employed to account for the complex node relations, they cannot fully represent these complexities in practice. Additionally, though many Graph Neural Networks (GNNs) have been proposed for representation learning on higher-order graphs, they are usually only evaluated on simple graph datasets. Therefore, there is a need for a unified modelling of higher-order graphs, and a collection of comprehensive datasets with an accessible evaluation framework to fully understand the performance of these algorithms on complex graphs. In this paper, we introduce the concept of hybrid graphs, a unified definition for higher-order graphs, and present the Hybrid Graph Benchmark (HGB). HGB contains 23 real-world hybrid graph datasets across various domains such as biology, social media, and e-commerce. Furthermore, we provide an extensible evaluation framework and a supporting codebase to facilitate the training and evaluation of GNNs on HGB. Our empirical study of existing GNNs on HGB reveals various research opportunities and gaps, including (1) evaluating the actual performance improvement of hypergraph GNNs over simple graph GNNs; (2) comparing the impact of different sampling strategies on hybrid graph learning methods; and (3) exploring ways to integrate simple graph and hypergraph information. We make our source code and full datasets publicly available at https://zehui127.github.io/hybrid-graph-benchmark/.

Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs

TL;DR

The concept of hybrid graphs is introduced, a unified definition for higher-order graphs is presented, and an extensible evaluation framework and a supporting codebase are provided to facilitate the training and evaluation of GNNs on HGB.

Abstract

Graphs are widely used to encapsulate a variety of data formats, but real-world networks often involve complex node relations beyond only being pairwise. While hypergraphs and hierarchical graphs have been developed and employed to account for the complex node relations, they cannot fully represent these complexities in practice. Additionally, though many Graph Neural Networks (GNNs) have been proposed for representation learning on higher-order graphs, they are usually only evaluated on simple graph datasets. Therefore, there is a need for a unified modelling of higher-order graphs, and a collection of comprehensive datasets with an accessible evaluation framework to fully understand the performance of these algorithms on complex graphs. In this paper, we introduce the concept of hybrid graphs, a unified definition for higher-order graphs, and present the Hybrid Graph Benchmark (HGB). HGB contains 23 real-world hybrid graph datasets across various domains such as biology, social media, and e-commerce. Furthermore, we provide an extensible evaluation framework and a supporting codebase to facilitate the training and evaluation of GNNs on HGB. Our empirical study of existing GNNs on HGB reveals various research opportunities and gaps, including (1) evaluating the actual performance improvement of hypergraph GNNs over simple graph GNNs; (2) comparing the impact of different sampling strategies on hybrid graph learning methods; and (3) exploring ways to integrate simple graph and hypergraph information. We make our source code and full datasets publicly available at https://zehui127.github.io/hybrid-graph-benchmark/.
Paper Structure (27 sections, 3 equations, 5 figures, 11 tables)

This paper contains 27 sections, 3 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: A hybrid graph (left) can (1) contain multiple node levels, where nodes in adjacent levels can have hierarchical relations, and every node at a lower hierarchy must belong to a node at a higher hierarchy; and (2) contain both simple edges (edges that connect exactly two nodes) and hyperedges (edges can connect two or more nodes). It can be transformed into a hierarchical graph (upper right), a hypergraph (lower right) or even a simple graph, by tightening relevant constraints.
  • Figure 2: Construction of the HGB datasets. The MUSAE datasets are social and knowledge networks, where the hyperedges are constructed from friend circles or mutually linked page groups. The GRAND datasets are gene regulatory networks, where the hyperedges are formed using the positions of genomic elements on the chromosome. The Amazon datasets are product co-review networks, where the hyperedges are built from the clusters of the image embeddings of the products.
  • Figure 3: An overview of the HGB framework. (a) The HGB datasets include 23 real-world hybrid graphs, which preserve both simple and higher-order graph structures. (b) We also build a lightweight training-test framework using PyTorch Lightning and PyTorch Geometric.
  • Figure 4: Evaluating the accuracy of simple-graph and hypergraph GNNs on HGB: (a) The scatter plot of the accuracies of hypergraph GNNs with respect to simple graph GNNs on each dataset. The black dashed line at $y=x$ serves as a reference line where both GNN types perform equally well, while a dot above this reference line means hypergraph GNNs perform better than simple graph GNNs on a given node classification task, and vice versa. The plot indicates that hypergraph GNNs match simple graph GNNs' accuracies on GRAND, underperform on MUSAE, and outperform on Amazon. (b) The bar chart of the accuracies of both GNN types aggregated within MUSAE, GRAND, and Amazon, shows that the performance improvement of hypergraph GNNs on Amazon compared to simple graph GNNs may not be significant, after taking the standard deviations into consideration.
  • Figure 5: Node classification accuracy of different sampling techniques on (a) MUSAE-GitHub and (b) MUSAE-Facebook. The charts compare the accuracy of HyperConv model with three variants of HybirdGraphSAINT samplers at different subgraph sizes. The size of the subgraph is measured by the number of nodes. The black dashed line indicates the performance of HyperConv trained on the whole graph without any sampling. The accuracy increases as subgraph sizes increased, with both HybridGraphSAINT-RW and HybridGraphSAINT-Node outperforming standard HyperConv. HybirdGraphSAINT-RW consistently shows superior performance across different sampling sizes.