GraphLand: Evaluating Graph Machine Learning Models on Diverse Industrial Data
Gleb Bazhenov, Oleg Platonov, Liudmila Prokhorenkova
TL;DR
GraphLand addresses the paucity of diverse, real-world graph benchmarks by introducing 14 industrial graphs for node property prediction, enabling evaluation across domains, feature types, and temporal dynamics. The study systematically compares graph-agnostic baselines, GNNs, and graph foundation models, revealing that while attention-based GNNs generally excel, GFMs lag in scalability and performance on these realistic datasets. A key finding is that temporal distributional shifts and inductive evaluation significantly challenge models, highlighting the need for shift-resilient architectures and truly general GFMs. By providing rich datasets, splits, and baseline results, GraphLand offers a practical platform to advance industrially relevant GML research and evaluation.
Abstract
Although data that can be naturally represented as graphs is widespread in real-world applications across diverse industries, popular graph ML benchmarks for node property prediction only cover a surprisingly narrow set of data domains, and graph neural networks (GNNs) are often evaluated on just a few academic citation networks. This issue is particularly pressing in light of the recent growing interest in designing graph foundation models. These models are supposed to be able to transfer to diverse graph datasets from different domains, and yet the proposed graph foundation models are often evaluated on a very limited set of datasets from narrow applications. To alleviate this issue, we introduce GraphLand: a benchmark of 14 diverse graph datasets for node property prediction from a range of different industrial applications. GraphLand allows evaluating graph ML models on a wide range of graphs with diverse sizes, structural characteristics, and feature sets, all in a unified setting. Further, GraphLand allows investigating such previously underexplored research questions as how realistic temporal distributional shifts under transductive and inductive settings influence graph ML model performance. To mimic realistic industrial settings, we use GraphLand to compare GNNs with gradient-boosted decision trees (GBDT) models that are popular in industrial applications and show that GBDTs provided with additional graph-based input features can sometimes be very strong baselines. Further, we evaluate currently available general-purpose graph foundation models and find that they fail to produce competitive results on our proposed datasets.
