Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks
Maya Bechler-Speicher, Ben Finkelshtein, Fabrizio Frasca, Luis Müller, Jan Tönshoff, Antoine Siraudin, Viktor Zaverkin, Michael M. Bronstein, Mathias Niepert, Bryan Perozzi, Mikhail Galkin, Christopher Morris
TL;DR
This paper argues that graph learning risks losing relevance due to poor benchmarks that emphasize narrow, often 2D molecular tasks rather than transformative real-world applications. It surveys foundational benchmarking shortcomings, including questionable graph constructions, fragmented evaluations, and a culture of inadequate baselines, and proposes concrete remedies such as focusing on combinatorial optimization, principled graph construction, and robust evaluation protocols. Through empirical investigations—ranging from re-tuning baselines on PCQM4Mv2 to multi-task encoder-processor-decoder pre-training—the authors illustrate that many claimed gains arise from evaluation artifacts rather than true advances in graph understanding. The work advocates for large-scale, diverse graph datasets, hidden-test benchmarks, and closer domain collaboration to develop graph foundation models that generalize across tasks and domains, thereby preserving the field’s practical impact.
Abstract
While machine learning on graphs has demonstrated promise in drug design and molecular property prediction, significant benchmarking challenges hinder its further progress and relevance. Current benchmarking practices often lack focus on transformative, real-world applications, favoring narrow domains like two-dimensional molecular graphs over broader, impactful areas such as combinatorial optimization, relational databases, or chip design. Additionally, many benchmark datasets poorly represent the underlying data, leading to inadequate abstractions and misaligned use cases. Fragmented evaluations and an excessive focus on accuracy further exacerbate these issues, incentivizing overfitting rather than fostering generalizable insights. These limitations have prevented the development of truly useful graph foundation models. This position paper calls for a paradigm shift toward more meaningful benchmarks, rigorous evaluation protocols, and stronger collaboration with domain experts to drive impactful and reliable advances in graph learning research, unlocking the potential of graph learning.
